Weka or LingPipe for New Data Scientist

I started working in Weka and Lingpipe around 2 years ago. My task was to develop a better clustering algorithm for text data. I initially used Weka to familiarize my self with basic clustering algorithms, however I found Weka has more documentation for classification algorithms than clustering algorithms. I came across Lingpipe framework on the internet and found that their blog provides different tutorials about clustering and code walk-through of clustering algorithms. The tutorials were very well written and they helped me in understanding the implementation of clustering algorithms. Lingpipe framework also provide tokenization, stemming and other text processing facilities which saved my time in basic text processing.

I would recommend to start with Lingpipe for new data scientist especially if you are into clustering algorithms and later switch to Weka. I found Lingpipe to be good framework for begin with but Weka was more reliable in case of performing complex text mining tasks especially on large datasets. There are good video tutorials  on text mining in Weka which are worth listening to.


If you are new Data Scientist or experienced one, I like to hear your story and your favourite tool/framework for text mining tasks.

Leave a Reply

Your email address will not be published. Required fields are marked *