Clustering Bigdata

Clustering large amount of data brings complexity and requires special clustering algorithms. Common clustering algorithms like k-means are not designed to handle such tasks. Anil K. Jain, A big name in domain of clustering algorithms explains this phenomena in his video lecture (http://videolectures.net/single_jain_bigdata/). He provides a solution “approximate k-means algorithm” which cluster large amount of data (bigdata). Other researcher like Xiao Cai et. al, proposes another variant of k-means to cluster bigdata.

Since k-means is faster than other algorithms and has time complexity of O(n), researchers prefer to develop new algorithm based on k-means. However, there are others such as Vincent Granville who used Hierarchical Agglomerative Clustering based algorithms with mapReduce and indexing mechanism to cluster large amount of data. It is interesting to know that his algorithm has complexity of O(nlogn) which is slightly higher, but I assume the quality of results would be better than k-mean variants.

Clustering Bigdata was the problem of 2014 and now there are many algorithms to easily handle such issues.

 

Leave a Reply

Your email address will not be published. Required fields are marked *