Why every developer should learn R, Python, and Hadoop.

Recently I used R for my course project on data mining. The course didn’t require that we use R, or Python. Instead, the course was thought on WEKA. But here’s why I think it should be done on R or Python in future years.

R is a heavy-duty language – R is a powerful scripting language. It will help you handle large, complex data sets. I was struggling to run WEKA with a dataset of no more than 5 million. Since part of data mining involves creating visualizations to better understand the relations of attributes, R seemed to be the natural best-fit for a course on data mining, and not WEKA. WEKA keeps crashing and the algorithms run comparatively faster on R and Python. This is partly due to the fact that R can be used on a high performance computer clusters which can manage the processing capacity of huge number of processes.  One other thing I liked the most was visualization tool that R is equipped with. The graphs and plots of R are so vivid and eye-catching.

Python is user-friendly- Python, similar to Java, C, Perl, is one of the more easier languages to grasp. Finding and squashing bugs is easier in python because it is a scripting language. Moreover, python is a object oriented language. Python is a performer like R. The other good thing is that if you are planning to do some fun oriented things with something called the Raspberry Pi, then Python is the language to learn.

Hadoop – Hadoop is well suited for huge data. Remember the issue I had with WEKA due to the size of my dataset. That problem can be eliminated by using Hadoop. Hadoop will split the dataset into many clusters and perform the analysis on those clusters and combine them together. Top companies like Dell, Amazon, and IBM that own terra-bytes of data have no choice but to use Hadoop.

You need to learn this three tools at a minimum in order to be a good data scientist and to do a good, thorough analysis on a given data.

 

Advertisements

About thewisedeveloper

Hi all, I am a Junior/Senior Computer Science student at Worcester State University, Worcester, Massachusetts.
This entry was posted in Big Data Analytics, Computer Science, Hadoop and tagged . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s