Fuzzy C-Means Clustering
Clustering is the task of grouping a set of data points in such a way that objects in the same group are more similar to each other than to those in other groups. It is the common technique for statistical data analysis and used in many fields like pattern recognition, image analysis, data compression, information retrieval, and computer graphics.
Types of clustering
Clustering is divided into two types:
- Hard Clustering: In hard clustering each data point either belongs to a cluster completely or not.
- Soft Clustering: In soft clustering instead of putting each data point into a separate cluster, a probability of that data point to be in other clusters is assigned.
Fuzzy clustering
Fuzzy clustering is a clustering technique in which each data point belongs to two or more clusters. It is also referred as soft k-means or soft clustering. Clusters are identified using similarity measures. These similarity measures include connectivity, intensity and distance. One of the most widely used fuzzy clustering algorithm is Fuzzy C-means Clustering.
Fuzzy c-means clustering (FCM) is a data clustering technique in which the data points is grouped into N clusters with every data point that belongs to other clusters to a certain degree. For example, the data points which lies close to the center of a cluster will have a high degree of membership in that cluster, and other data points which lies far away from the center of the cluster will have a low degree of membership to that cluster.
Steps for fuzzy c-means clustering
Step 1: Initialize the data points into desired number of clusters.
Step 2: Find out the centroid of each cluster.
The formula for finding the centroid of cluster
Where, µ = fuzzy membership value of the data point
m = fuzziness parameter (generally taken as 2)
xk = data point
Step 3: Find out the distance of each point from both the centroids.
Step 4: Update the membership values.
Step 5: Repeat the steps (2-4) until the constant values are obtained for the membership values or difference is less than the tolerance value
Step 6: Defuzzify the obtained membership values.
Example
Consider data points {(1, 3), (2, 5), (6, 8), (7, 9)}
Step 1: Let’s consider no of clusters is 2 in which the data is to be divided
Cluster (1, 3) (2, 5) (4, 8) (7, 9)
1 0.8 0.7 0.2 0.1
2 0.2 0.3 0.8 0.9
Step 2: Find out the centroid of the cluster
V11 = (0.82 *1 + 0.72 * 2 + 0.22 * 4 + 0.12 * 7) / (0.82 + 0.72 + 0.22 + 0.12) = 1.568
V12 = (0.82 *3 + 0.72 * 5 + 0.22 * 8 + 0.12 * 9) / (0.82 + 0.72 + 0.22 + 0.12) = 4.051
V11 = (0.22 *1 + 0.32 * 2 + 0.82 * 4 + 0.92 * 7) / (0.22 + 0.32 + 0.82 + 0.92) = 5.35
V11 = (0.22 *3 + 0.32 * 5 + 0.82 * 8 + 0.92 * 9) / (0.22 + 0.32 +0.82 + 0.92) = 8.215
Centroids: (1.568, 4.051) and (5.35, 8.215)
Step 3: Find out the distance of each point from both the centroids.
D11 = ((1 - 1.568)^2 + (3 - 4.051)2) ^0.5 = 1.2
D12 = ((1 - 5.35)^2 + (3 - 8.215)2) ^0.5 = 6.79
Similarly, the distance of all data points is computed from both the centroids.
Step 4: Update the membership values.
Similarly, compute all membership values and update the matrix.
Step 5: Repeat the steps (2-4) until the constant values are obtained for the membership values or difference is less than the tolerance value
Step 6: Defuzzify the obtained membership values.
For python code - Click Here
For R code - Click Here
No comments:
Post a Comment