What is a Data Scientist? The basics of AI and Machine Learning

In many discussions and interviews people tell me they are a data scientist. In that case I always ask what a data scientist actually does in the day to day life, in their opinion. And then I get all kind of different answers. If you want to have a fun afternoon read the description of a data scientist role in some of the recruiting sites. You will find words like: “Hadoop”, “Tensorflow”, “SQL”, “Machine Learning”, “AI”, “Deep learning”, “recommendation engines”, “Python” and all kind of other items. With some extreme simplification I will (hopefully) bring some clarity in the exiting world of AI/Machine learning.

To understand what the actual capabilities are you need in your team, completely depends on what the goals are and in which field you operate. I used the following hierarchy:

Machine Learning hierarchy
Machine Learning hierarchy

Deep learning is a part of the machine learning capability area. Deep Learning is not alone in this space: supervised learning, recommendation engines and much more is in the area of machine learning. Each topic with more or less different technologies and expertise. I call the person that knows about those technology a machine learning specialist. But be aware: within this specialism there are sub specialists tied to the different technologies. And I even do not touch the knowledge and experience of tool sets that support these technologies! That is another dimension, that I ignore for the sake of simplicity.

This does not mean a machine learning specialist does not need data scientist skills. Actually this is required. But a data scientist is more focussed on the technical less specialized AI technologies and can be considered as a more generalist on the machine learning skills combined with also the more advanced visualisation tools and dashboarding tooling. Also a data scientist is more skilled in data handling and IT infra capabilities. Again, a machine learning specialist should also master the basics but not as skilled as a data scientist.

A data engineer is much more an expert in the hardware architecture and the tooling to manage large amounts of data and for example specialist in high performance computing.

All quite complicated and useful to be aware of the differences and the team you are setting up. My tip to the managers is: use this image to let explain your data scientists where he/she wants to work on!

For more information: reach out to me or have a look at www.jorgensandig.nl