Data Science is a comparatively new domain which most of us is not thorough with, agreed? In fact, it is such a vast area that companies go through a hard time determining the prerequisites needed in a good candidate. Hiring and interviewing them is a challenge which surely follows.
So we decided to interview our top Data Science expert, Saikat Sarkar, who has been working and teaching in this domain since the last 10 years. He is presently acting as the Subject Matter Expert of Python in Aegis School of Data Science. In a candid conversation with AirCTO he advises on the tricks to evaluate a Data Science Candidate as well as provides with hacks to source such candidates. He also talks about the importance of the Data Science domain today and in the years to come.
How did your journey in the Data science domain start?
Data Science found me before I could actually know the definition, around 2002-03, I had passed out of college and was trying to build a chat-bot. I was using C to do that. Since there was not a lot of materials available for me to refer to, neither the internet was so available, it turned out to be a good academic project, but not a live one. I tried another project using IoT which could not take off as well due to unavailability of hardware materials.
After that I moved to Mumbai in a BPO, where I found that there are huge amounts of data and I was constantly churning them either with Excel or with programming languages etc. I was finding that conventional tools were not sufficient to find the answers that I was looking for, like what is expected to happen in the next three months.
Out of curiosity, I chanced upon Python and started using it, combined with ML for predictive analytics. About two odd years back, I got an offer for teaching Python, and I was told that the ultimate objective was to produce a Data Scientist. That's when I got acquainted with various Machine Learning algorithms and how they can be used in NLP.
How important is Data Science for an organisation?
See, everyone in an organisation knows what is happening today, it's not a big deal. But, if I tell you what is going to happen in the next three months, that's what is going to add a value. Suppose, if I can predict that there is going to be a negative impact on the organisation say in terms of sales, and in addition to it, if I can also tell the CEO what he needs to do to ensure that with the changing circumstances, so that the order flow does not go down, on the contrary it increases, I would be adding some real value. So, it's not only predictive analytics but a prescriptive analytics; that's the beauty of Data Science or Big Data.
What are the common data science profiles that are in demand nowadays?
There are two perspectives, the Indian and the Global perspective, we are actually lagging behind in terms of Global perspective. From a Global perspective, I would expect a candidate to be familiar with the concepts of Deep learning which is a second stage of Machine Learning, like Artificial Neural Networks, it is being rapidly used in US and all other parts of the world.
Typically, a company should look at candidates with fairly strong programming skills and statistical understanding of ML. Exposure to big data platforms like Hadoop and Spark would be added bonus. Decision making, random forest are other topics. I am looking if the candidates have an understanding of what goes on in these and the knowledge of which problem should not be put into these. I am not looking for them to implement the algorithm but given some basic input of data that's available, are they able to make the choice of selecting the proper tools and algorithm? Are they able to shortlist which ones to try and which ones to not? This would give me an understanding on their ability to use the algorithm in a real life scenario.
What would be the difference while interviewing a beginner, mid-level and a senior person in this domain? What are the typical skill sets evaluated for each?
More you go up the management ladder, the lesser is the technical skills required, so the more senior the guy is, I would ask him lesser of code. Instead, I would give him a situation, with the data characteristics and ask him about the algorithm, pre-processing steps etc. He should be able to give me the concept and the way forward.
Coding is not the big problem here, but deciding on what to do is. If I am evaluating a 3-5 year experienced guy, I would expect him to understand which path to take rather than asking him to code. But, if I would evaluate a junior guy, I would tell him that he has to take Path A, and he answers me in terms of programming language or code. So, that kind of differentiation should be there.
How can one keep himself updated?
This is a field, where everyday new things are coming up one should be thoroughly updated. I tell my students, that if I am teaching them an approach and they implement that in the assignment even in the best possible way, they would get only 50% of the marks, the rest 50% marks are given if they can find a more innovative approach to the problem.
If one has the mindset of doing continuous research and finding out new ways, he is the person who would learn. Because the field is ever-developing, there are new patents submitted every day, new implementations, versions, algorithms are also coming up everyday. Every 2-3 months you get new versions and tools coming up. You have to be very quick to upgrade yourself. This is even more important in the higher level, you should not repeat your mundane experience of using the same tools for 365 days of the year, instead explore and innovate.
How important is research papers/open source projects of the candidates?
If I am the recruiter, if I find someone working on a similar project and if after speaking with him I feel that whatever he has put in the CV is realistically true, it would be the greatest plus point for me.
Project experience matters a lot here, because I am putting in a lot at stake. Salary in Data Science field are very high and the stakes are very high. Here, we cannot afford any error, one error can result in huge losses. It's not just what your certificate says, once you say that you have been there and done it, you actually have to convince the interviewer that you have done it.
People say that they know stuffs, but it has happened that when I actually started drilling them, they could not convince me to buy the project. So, if you are claiming that you have worked on a project, you should understand your role and profile and everything that can be questioned on that role and profile. There is no space for doubt.
How does one hire good Data Scientists?
I will suggest a different alternative as opposed to the traditional ways of head-hunting. Websites like Kaggle, Analytics Vidya, conduct contests and offer huge prize money for those who rank well in those contests. These can be as good as a job experience. How these contest comes out is, suppose I am a company and I want to build a recommendation system for myself. I can recruit say 4 data scientists who can build that for me, but on the hind sight, what if the recommendation system does not work after I pay them a salary for one year. So I am at a huge risk, right?
Instead of that, I put up a contest of 1 million dollar, and I pay that amount only to the guy who can provide me with a satisfactory solution. Since it's a global contest, I also get to choose from the best of the best. These contests are a huge boost for the data scientists.
For a startup, when does one realise that he has to hire a Data Scientist in the team?
It should start when they are initially planning the startup, once they have some financial support. For example, in an e-commerce system, where 70% of my sales is given to the recommendation system, he is probably the key element in the organisation.
So, even if you are not going for a full-time hire, rope him in and he is going to advise and guide you. Today, the world is actually data-driven; we lag behind in keeping track of these data, and we end by saying that the world is chaotic instead of following and analysing those data sources. Thus the early you start planning, the better it is.
How can one make a transition in the data science domain?
A Data Scientist is a person who understands statistics more than a programmer and programming more than a statistician. So, he has to have some kind of qualification. The basic should be either programming, or statistics or a combination of both. Here, qualifications does not matter a lot but the courses on ML, projects which he has done, contests he has participated, hold the real key and I am talking about a fresher here. None of the institute can give a complete course, the last course I interacted with was prepared by the IITs, IIMs and IST. Imagine it requires three greatest institutes in the country to device a course like that. Such is the importance and depth of this domain.
Examples of statistics and probability questions that can be asked?
When I was in full time teaching and tried to push my candidates into the organisation, I found this as a hurdle. Modern day ML concepts are not known by the senior people of the organisation, they have worked on old school statistics, so that's the field they try to drill the candidates in. If I am interviewing, I would typically try to ask them an algorithm, give them a data-set and ask them on the statistical features out of it. So , I am looking at applied statistics here.
So, if I tell them certain characteristics about the data, I see if they can tell me if the data is normalised or if I tell them about structure of a graph, if they are able to determine the key features that I would derive statistically out of that graph. So, these kind of approaches are important while evaluating the candidates.
Data Science will be one of the most demanding fields in the coming years and there will be a lot of transitions and paradigm shift that one would witness. As a candidate or a hiring manager, it is therefore imperative that one stays updated on the latest trends. Hope, this article will be constructive for you while hiring your next Data Scientist.