Fuzzy C-Means Clustering

Fuzzy C-Means Clustering

Clustering is the task of grouping a set of data points in such a way that objects in the same group are more similar to each other than to those in other groups. It is the common technique for statistical data analysis and used in many fields like pattern recognition, image analysis, data compression, information retrieval, and computer graphics.

Types of clustering

Clustering is divided into two types:

Hard Clustering: In hard clustering each data point either belongs to a cluster completely or not.
Soft Clustering: In soft clustering instead of putting each data point into a separate cluster, a probability of that data point to be in other clusters is assigned.

Fuzzy clustering

Fuzzy clustering is a clustering technique in which each data point belongs to two or more clusters. It is also referred as soft k-means or soft clustering. Clusters are identified using similarity measures. These similarity measures include connectivity, intensity and distance. One of the most widely used fuzzy clustering algorithm is Fuzzy C-means Clustering.

Fuzzy c-means clustering (FCM) is a data clustering technique in which the data points is grouped into N clusters with every data point that belongs to other clusters to a certain degree. For example, the data points which lies close to the center of a cluster will have a high degree of membership in that cluster, and other data points which lies far away from the center of the cluster will have a low degree of membership to that cluster.

Steps for fuzzy c-means clustering

Step 1: Initialize the data points into desired number of clusters.

Step 2: Find out the centroid of each cluster.

The formula for finding the centroid of cluster

Where, µ = fuzzy membership value of the data point

m = fuzziness parameter (generally taken as 2)

xk = data point

Step 3: Find out the distance of each point from both the centroids.

Step 4: Update the membership values.

Step 5: Repeat the steps (2-4) until the constant values are obtained for the membership values or difference is less than the tolerance value

Step 6: Defuzzify the obtained membership values.

Example

Consider data points {(1, 3), (2, 5), (6, 8), (7, 9)}

Step 1: Let’s consider no of clusters is 2 in which the data is to be divided

Cluster (1, 3) (2, 5) (4, 8) (7, 9)

1 0.8 0.7 0.2 0.1

2 0.2 0.3 0.8 0.9

Step 2: Find out the centroid of the cluster

V11 = (0.82 *1 + 0.72 * 2 + 0.22 * 4 + 0.12 * 7) / (0.82 + 0.72 + 0.22 + 0.12) = 1.568

V12 = (0.82 *3 + 0.72 * 5 + 0.22 * 8 + 0.12 * 9) / (0.82 + 0.72 + 0.22 + 0.12) = 4.051

V11 = (0.22 *1 + 0.32 * 2 + 0.82 * 4 + 0.92 * 7) / (0.22 + 0.32 + 0.82 + 0.92) = 5.35

V11 = (0.22 *3 + 0.32 * 5 + 0.82 * 8 + 0.92 * 9) / (0.22 + 0.32 +0.82 + 0.92) = 8.215

Centroids: (1.568, 4.051) and (5.35, 8.215)

Step 3: Find out the distance of each point from both the centroids.

D11 = ((1 - 1.568)^2 + (3 - 4.051)2) ^0.5 = 1.2

D12 = ((1 - 5.35)^2 + (3 - 8.215)2) ^0.5 = 6.79

Similarly, the distance of all data points is computed from both the centroids.

Step 4: Update the membership values.

Similarly, compute all membership values and update the matrix.

Step 5: Repeat the steps (2-4) until the constant values are obtained for the membership values or difference is less than the tolerance value

Step 6: Defuzzify the obtained membership values.

For python code - Click Here

For R code - Click Here

Author Description