Friday, December 06, 2013

Market trend in advanced analytics for SAS, R and Python

Disclaimer: This study is a view on the market trend on demand of advanced analytics software and their adoptions from the job market perspective, and should not be read as a conclusive statement on what is all happening there. The findings should be combined with other third party research results such IDC reports http://www.idc.com/getdoc.jsp?containerId=241689 to reach a balanced and comprehensive idea.

/* --------------------------------------------------------------------------------------------------------------------*/
The debate on the competition between SAS, R and Python for advanced analytics, Statistics or Data Science in general never ends. The question boils down to how far SAS, as a software and programming language for such purpose, can still be used by business in mass, given its current pricing structure.

Some recent discussions can be found at:

http://www.listserv.uga.edu/cgi-bin/wa?A2=ind1312a&L=sas-l&F=&S=&P=1171

http://r4stats.com/2013/03/19/r-2012-growth-exceeds-sas-all-time-total/

http://r4stats.com/articles/popularity/

and more...

indeed.com has been used to address such question before by Robert Muenchen @ http://r4stats.com/articles/popularity/. His analysis on popularity of SAS and R in the job market using indeed.com was simple. The terms he used were just "R SAS" and "SAS !SATA !storage !firmware".

In my analysis using indeed trend, I directly search combination of languages such as R, SAS and Python with analytics related job terms, such as "R AND Analytics" or "R AND Regression", etc. The goal is to understand the market dynamics in adopting each of the three languages in advanced business analytics. I try to be as fair as possible in my analysis.

The job description related terms that I am going to combine with the language names fall into three categories:

1. Techniques typically used, here I use "Regression", which almost all advanced business analytics job will look for;
2. Industry, here I use "pharmaceutical", and I used "JPMorgan Chase" to represent financial service (even though it is not representative"
3. General term such as "data mining", "machine learning" and "statistics". In general, data mining has different implication from machine learning. The former is a more general term, or a modern term for "analytics" while the latter shows up momentum only recently and largely focus on computer science related field and more hardcore on algorithm development, etc, so favoring Python over the other two.

The graph below shows indeed.com job trends using search term "R and regression, SAS and regression, Python and regression". It has three immediate pieces of information to tell:
a.) Python has been picking up very fast since 09;
b.) The market share gap between SAS and R/Python is consistently dropping since 2006 and in foreseeable future the lead of SAS is going to disappear.
c.) SAS's market share reached high peak in 2011 but kept dropping thereafter while Python maintains steady trend and R is picking up in the same time period.




The below graph searches the trend using term "R and pharmaceutical", etc. I wanted to see how the trend of adopting those three languages in pharmaceutical industry. What we found out are three points:
a.) R and SAS almost show up at the same time in pharmaceutical jobs
b.) The relative job demand for analytics in pharmaceutical industry is declining.
c.) Python is on the rise even though still trailing far behind SAS and R.



When using "JPmorgan Chase" combined with languages for trend analysis, I specifically excluded "storage" from the SAS search term because the bank may hire some IT folks and result will mix with SAS storage system. The term is ' R and "jpmorgan chase", SAS not Storage and "jpmorgan chase", Python and "jpmorgan chase" '. There are several interesting observations:
a.) Python is picking up some job shares in JPmorgan Chase from 2010. R is losing;
b.) There is noticeable seasonality in hiring, mostly in summer time;
c.) Before 2008, there was strong demand for analysts using SAS, then the crisis came and hiring stalled for almost 2 years. Beginning in 2011, the business is getting better and the demand for analysts using SAS is picking up all year around and attained peak in this year.


In our last part of trend analysis, we want to combine the languages with more general terms such as "analytics", "data mining" to see how the whole business perceives these three analytical languages.

We first combine with "data mining". Three points need to be noted:
1. The job market for analytics in general has been increasing since 2009 and all languages shows good momentum from 2009 to 2010;
2. SAS was the dominant player in this business, but the landscape is changing rapidly. Both R and Python see almost identical adoption by the business. There are some minimal gap between these two but I would consider it not significant;
3. SAS is not as favored by the market as before and its share not only stalled but also showed some declining trend this year. This is very alarming.


Now, let's be more focused. First, we study "Statistics". The message is not good for SAS here. SAS used to be THE de facto software for industrial statistical analysis. This fact is reflected by the trend before 2010. SAS is so dominant that its market share is more than 3X comparing to R and even more comparing to Python. But what's interesting is that while SAS saw some good days from 2010 to 2011, which also reflected in their annual revenue", its market share is on a steady down road from 2012. On the other hand, R and Python are still picking up without hesitation.



Turning to "machine learning", Python is the leader, followed by R. SAS sees some distance here. The seasonality of Python is a little different from R and SAS while the latter two have very close seasonal dynamics. The hypothesis here is that Python is dominantly used in different industries while R and SAS share similar industries.

4 comments:

Steve Polilli said...

I'm a SAS employee so I'd like to offer an alternative perspective on the jobs outlook. Take a look at http://bit.ly/IPE85K For an alternative perspective with some useful links. Thanks.

Unknown said...

Interesting post, David. Thanks.

The growth in R usage also matches with what we have seen in our 2007-2013 Data Miner Surveys. We recently released the latest summary report, and we have included several pages that describe the skyrocketing growth in R usage. The highlights are here: http://www.rexeranalytics.com/Data-Miner-Survey-Results-2013.html. Anyone who wants a copy of the FREE 41-page report can contact us at [email protected].

Happy Holidays, everyone!
-- Karl

Bob Muenchen said...

It turns out that R and SAS are abbreviations for many other things (H R, R&D, etc.) The term "statistics" is also returns hits that have very little to do with analytics. Check out my latest search strings here:
http://r4stats.com/articles/how-to-search-for-analytics-jobs/
And the latest results which are here:
http://r4stats.com/articles/popularity/

Cheers,
Bob Muenchen

Liang Xie said...

Steve Polilli:
Thanks for providing an alternative view on this issue. SAS is still performing strongly in its own territory comparing to its own history, and there are still large demand for SAS analyst. On the other hand, the rise of Data Scientist discipline makes R and Python more appealing. SAS needs to do more to gain this new market.