Thinking Like a Data Scientist

Introduction

Nowadays, everyone with some technical expertise and a data science bootcamp under their belt calls themselves a data scientist. Also, most managers don't know enough about the field to distinguish an actual data scientist from a make-believe one someone who calls themselves a data science professional today but may work as a cab driver next year. As data science is a very responsible field dealing with complex problems that require serious attention and work, the data scientist role has never been more significant. So, perhaps instead of arguing about which programming language or which all-in-one solution is the best one, we should focus on something more fundamental. More specifically, the thinking process of a data scientist.

The challenges of the Data Science professional

Any data science professional, regardless of his specialization, faces certain challenges in his day-to-day work. The most important of these involves decisions regarding how he goes about his work. He may have planned to use a particular model for his predictions or that model may not yield adequate performance (e.g., not high enough accuracy or too high computational cost, among other issues). What should he do then? Also, it could be that the data doesn't have a strong enough signal, and last time I checked, there wasn't a fool-proof method on any data science programming library that provided a clear-cut view on this matter. These are calls that the data scientist has to make and shoulder all the responsibility that goes with them.

Why Data Science automation often fails

Then there is the matter of automation of data science tasks. Although the idea sounds promising, it's probably the most challenging task in a data science pipeline. It's not unfeasible, but it takes a lot of work and a lot of expertise that's usually impossible to find in a single data scientist. Often, you need to combine the work of data engineers, software developers, data scientists, and even data modelers. Since most organizations don't have all that expertise or don't know how to manage it effectively, automation doesn't happen as they envision, resulting in a large part of the data science pipeline needing to be done manually.

The Data Science mindset overall

The data science mindset is the thinking process of the data scientist, the operating system of her mind. Without it, she can't do her work properly, in the large variety of circumstances she may find herself in. It's her mindset that organizes her know-how and helps her find solutions to the complex problems she encounters, whether it is wrangling data, building and testing a model or deploying the model on the cloud. This mindset is her strategy potential, the think tank within, which enables her to make the tough calls she often needs to make for the data science projects to move forward.

Specific aspects of the Data Science mindset

Of course, the data science mindset is more than a general thing. It involves specific components, such as specialized know-how, tools that are compatible with each other and relevant to the task at hand, a deep understanding of the methodologies used in data science work, problem-solving skills, and most importantly, communication abilities. The latter involves both the data scientist expressing himself clearly and also him understanding what the stakeholders need and expect of him. Naturally, the data science mindset also includes organizational skills (project management), the ability to work well with other professionals (even those not directly related to data science), and the ability to come up with creative approaches to the problem at hand.

The Data Science process

The data science process/pipeline is a distillation of data science work in a comprehensible manner. It's particularly useful for understanding the various stages of a data science project and help plan accordingly. You can view one version of it in Fig. 1 below. If the data science mindset is one's ability to navigate the data science landscape, the data science process is a map of that landscape. It's not 100% accurate but good enough to help you gain perspective if you feel overwhelmed or need to get a better grip on the bigger picture.

Learning more about the topic

Naturally, it's impossible to exhaust this topic in a single article (or even a series of articles). The material I've gathered on it can fill a book!  If you are interested in such a book, feel free to check out the one I put together a few years back; it's called Data Science Mindset, Methodologies, and Misconceptions and it's geared both towards data scientist, data science learners, and people involved in data science work in some way (e.g. project leaders or data analysts). Check it out when you have a moment. Cheers!

 

Become a contributor

Spotlight

Data Science Partnership

Data Science Partnership was specifically created to help companies improve their business processes, become more profitable, and to make better, more informed decisions. Backed by a diverse team of some of the industry’s leading experts, DSP uses its knowledge and expertise to fully utilise the power of your data and to give you a genuine advantage over your competitors. Many other AI consultancies cloak data science in mystery, whilst using jargon that few can understand. We feel that this approach is completely unnecessary and does nothing to help the process. DSP realises that understanding data science, big data and Ai can be challenging for many of the people we help. That’s why DSP tackles every project with a pragmatic, plain-English approach that keeps everyone on the same page and working towards achieving clearly defined goals. DSP’s mission is to work with its clients for the long-haul and be consistent in delivering outstanding service and technologically-advanced solutions. DSP constantly work towards being the pinnacle of knowledge, and all employees and stakeholders receive regular training and attend seminars in order to maintain this.

Spotlight

Data Science Partnership

Data Science Partnership was specifically created to help companies improve their business processes, become more profitable, and to make better, more informed decisions. Backed by a diverse team of some of the industry’s leading experts, DSP uses its knowledge and expertise to fully utilise the power of your data and to give you a genuine advantage over your competitors. Many other AI consultancies cloak data science in mystery, whilst using jargon that few can understand. We feel that this approach is completely unnecessary and does nothing to help the process. DSP realises that understanding data science, big data and Ai can be challenging for many of the people we help. That’s why DSP tackles every project with a pragmatic, plain-English approach that keeps everyone on the same page and working towards achieving clearly defined goals. DSP’s mission is to work with its clients for the long-haul and be consistent in delivering outstanding service and technologically-advanced solutions. DSP constantly work towards being the pinnacle of knowledge, and all employees and stakeholders receive regular training and attend seminars in order to maintain this.

RELATED ARTICLES