Monthly Archives: January 2015

Basics of Bigdata

Bigdata is often misunderstood and thought to be very large data, however it is just one aspect of bigdata. The term Bigdata refers to data, which is too complex for traditional approaches to handle. The bigdata have following characteristics.     Volume – Large amount of the data. Velocity – Rapid generation of the data. Variability – Inconsistency of the data. Veracity – Quality of… Read more →

Weka or LingPipe for New Data Scientist

I started working in Weka and Lingpipe around 2 years ago. My task was to develop a better clustering algorithm for text data. I initially used Weka to familiarize my self with basic clustering algorithms, however I found Weka has more documentation for classification algorithms than clustering algorithms. I came across Lingpipe framework on the internet and found that their blog provides… Read more →

Clustering Bigdata

Clustering large amount of data brings complexity and requires special clustering algorithms. Common clustering algorithms like k-means are not designed to handle such tasks. Anil K. Jain, A big name in domain of clustering algorithms explains this phenomena in his video lecture ( He provides a solution “approximate k-means algorithm” which cluster large amount of data (bigdata). Other researcher like Xiao Cai et.… Read more →