BigData and DataScience Skills worth more than $100000 Salary

How many of you have seen recently published 2015 Salary Survey by The survey consist of responses from 23,470 IT professionals in the fall of 2014 and a list of highly paying technical skills. I am sure after viewing the list you might want to learn some of those skills. Big data is worth $116,414 with nearly 35,000 job listings  and Data Science is… Read more →

What is Wrong With All Machine Learning Models

John Langford a machine learning research scientist, works in Microsoft and author of the weblog, has recently published a brilliant article about flaws in machine learning models. Currently the link to his original article is down, but you can find his article as below. John Article (Taken from here) Attempts to abstract and study machine learning are within some given… Read more →

Machine Learning is a new form of statistics

Statistics and machine learning are thought to be two separate fields. But if you read good articles from highly reputed journals of machine learning you will realize that these two fields are merging together. Not too long ago, a new field “statistical machine learning” made it clear that these two field have too much in common. Coming from computer science background, I… Read more →

Data science related top 20 short tutorials (must read)

I have finished reading 20 short tutorial suggested by datasciencecentral. Its amazing, I particularly liked clustering and bigdata related articles. Following is the complete list, go ahead and let me know what’s your favourite article. Tutorial: How to detect spurious correlations, and how to find the … Practical illustration of Map-Reduce (Hadoop-style), on real data Jackknife logistic and linear regression for… Read more →

Basics of Bigdata

Bigdata is often misunderstood and thought to be very large data, however it is just one aspect of bigdata. The term Bigdata refers to data, which is too complex for traditional approaches to handle. The bigdata have following characteristics.     Volume – Large amount of the data. Velocity – Rapid generation of the data. Variability – Inconsistency of the data. Veracity – Quality of… Read more →

Weka or LingPipe for New Data Scientist

I started working in Weka and Lingpipe around 2 years ago. My task was to develop a better clustering algorithm for text data. I initially used Weka to familiarize my self with basic clustering algorithms, however I found Weka has more documentation for classification algorithms than clustering algorithms. I came across Lingpipe framework on the internet and found that their blog provides… Read more →

Clustering Bigdata

Clustering large amount of data brings complexity and requires special clustering algorithms. Common clustering algorithms like k-means are not designed to handle such tasks. Anil K. Jain, A big name in domain of clustering algorithms explains this phenomena in his video lecture ( He provides a solution “approximate k-means algorithm” which cluster large amount of data (bigdata). Other researcher like Xiao Cai et.… Read more →

Learning Curve of Datascience for Software Engineers

Coursera offers a free course “Introduction to Datascience“, which provide basic knowledge and specialization track  for becoming data scientist. if you are coming from Computer Science background and have good programming skills, then becoming a datascientist would be piece of cake for you. Derrick Harris has previously mentioned that its very easy to become datascientist and I think the learning curve is much shorter for… Read more →