Will Python Unseat R as the Programming Language for Data Scientists?
As this year comes to an end, a popular topic has been the debate about whether Python will displace R as the programming language of choice for data scientists. A quick search on the Internet will bring up a myriad of recently published articles, blog posts, and forums with some pretty some strong debate on the subject.
But let’s be clear: Python may be great and is quickly becoming a popular choice for start-ups, small companies, and individuals looking to venture into new IT realms, but R is still the main player when it comes to statistical sciences. And some reporters in the industry will harshly tell you so.
For those who don’t know, R is a software programming language designed specifically for the statistical computing and graphics environment. Thus, R still remains the most used data-mining and analytic language solution and actually showing growth in the statistical sciences industry in the past few years.
In August 2013 Vincent Granville noted in a blog post for Data Science Central that 61 percent of users who responded to a survey done by KDNuggets use R regularly. And if you're worried that R may be losing its grip to Python, consider that between 2012 and 2013, the number of data scientists reportedly using R has increased by 16 percent. For the past three years, R has remained the popular choice for data scientists. The growth from 2012 to 2013 for R—8.4 percent—was greater than Python's overall usage growth—2.7 percent.
So what programming language should you and your company be using? Honestly, it depends on the specifics of your project, your skill set, and how comfortable you are with both languages.
It's easy to find numerous advocates happy to share why the programming language they use is superior to the other. Both languages have grown in popularity, use, and acceptance as viable options for all the statistical analysis out there to be done. Just know that despite popular opinion, the data show that Python won’t replace R anytime soon.
Furthermore, David Smith of Revolution Analytics points out that having both languages around is a good thing and, rather than focusing on which language is better, we should look more at how they can benefit each other:
... (B)oth communities will consider to advance the art of data science, and as open-source communities will inevitably cross-pollinate each other. R has already influenced Python in the realm of data analysis, and it would be no bad thing if Python were to influence R in other areas. That, after all, is the beauty of open source software.
Which programming language do you prefer? Does your company favor one program over the other? Tell us in the Comments below.