The DNA of Data Scientist
It seems to me that few companies around are having their share of disappointment with quality of their "data science" crop. This may have something to do with their expectancy (we all know that Unicorns don't exist), or with some genuine lack of skills in one, or more data science knowledge pillars.
So, below is what I would look for in data scientist:
Part of that is aptitude for learning programing languages, and knowing Python, R... – without guarantee that something else won’t become new “flavour of the month” and therefore quick up-skill may be needed. Also, having solid practical experience in some enterprise-level analytical tool is good to have. I prefer quick and powerful code generating interface to take me as far as it can – before going slow programmatic route.
Other components under "technical proficiency" is having strong foundation built on some underlying academic disciplines such as applied maths, statistics, computer science... and of course - knowing most common analytical techniques, their applicability and pro's and con's. Also, it is very important knowing underlying analytical tasks that are performed by these techniques and related algorithms. Since, most of business questions that data scientist would grapple with would involve tasks of describing, visualising, predicting, optimizing, associating, sequencing, segmenting or clustering - knowing them to deeper technical level is essential for any data scientist.
Understands data preparation (feature engineering)
In reality, data is by-product of operational systems and it is not created with data science in mind. Therefore, it is vital for any data scientist to know how to prepare the data for data science exploitation. Even though, for deep learning knowledge component of prep and feature-engineering data may not be needed – we will not be there yet anytime soon! Most companies don’t have these skills and big data infrastructures ready to throw mountain of data on complex neural networks and expect that good quality of results will come out – quickly. Let’s crawl before we can walk, and walk before we can sprint. I had some companies coming to me who are struggling with producing monthly report wishing to do – deep learning, and that “hop-on-hype” attitude worries me somewhat.
Knows data science methodologies and best practices
Since data science project doesn’t start with data, nor with science – and it doesn’t end there either - it is vitally important to know all the steps in typical data science project, especially since output of one step is the input into other. Methodologies are there to guide scientist in same way that road signs guide drivers. I have very rarely have seen project flop because of wrong algorithms used or wrong techniques. But commonly I have seen them flop because of not following right steps and ending up in ditch.
Business applications of data science
This is often neglected knowledge pillar. Knowing cross-industry and industry specific applications of data science is massively important. Whether it is fraud-detection, response modeling, credit risk modeling, churn prediction, lapse prediction, cross-selling, profiling, generating customer insights – all these different applications require certain level of domain expertise and business knowledge and also great deal of know-how subtleties in building them from data science and analytical perspective . Depending of the industries some are more relevant than others – but at a very least data scientist should be familiar of most common cross-industry ones.
Not everyone understands the technical jargon involved in data science, and therefore good data scientist needs to be able to translate it into language and words that business audience can understand. Key decisions will continue to be in domain of business stakeholders – and they will not risk their budgets and reputations on blind trust given to data scientists. That is where believing in your own work and being confident, articulate and persuasive comes in – if data scientist is to be taken seriously in any business setting.
If more of the above ingredients is present in your "typical" data scientist - less likely you are to be disappointed. And be aware of the hype! Jumping on "bandwagon" without clear strategy is most common cause of technology failure within specific organization. Whatever technology is used - there are some fundamentals that will stay the same regardless. Just make sure that your data scientist is aware of them and focuses on "why" part first - and then on "how".
An article by Goran Dragosavac