Article | March 23, 2021
Learn, re Learn and Unlearn
The times we are living in, we have to upgrade ourselves constantly in order to stay afloat with the industry be it Logistics, Traditional business, Agriculture, etc.. Technology is constantly changing our lives the way we used to live, living and will live. Anyone who thinks technology is not their cup of tea then I would say he /she will have no place in the world to live. It’s a blessing or curse on human race, only time will tell but effects are already surfacing in the market in the form of Job cut, poverty, some roles are no longer needed or replaced with.
Poor is getting poorer and rich is getting richer. Covid19 has not only brought the curse on human race but it has been a blessing in disguise for Tech giants and E-commerce. Technology not only changing the business but every human’s outlook towards life, family structure, the globalization of talents etc. It is nerve wrenching to imagine just what the world will look like in coming 20 years from now. Can all of us adapt to learn, re learn and unlearn quote? Or we have to depend upon countries/Governments to announce Minimum Wage to sustain our basic needs? Uncertainties are looming as the world is coming closer due to technology but emotionally going far. It’s sad to see children, colleagues communicating via emails and messages in the same home and office. Human is losing its touch and feel.
Repercussion to resists of learning, unlearning and relearning can bring down choices to none in the long run. Delay in adapting to change can be increasingly expensive as one can lose their place in a world earlier than one think. From 1992, where fewer people used to have facility of internet around , People used to stay in jobs for life but same people are now not wanted in the jobs when they go for interview as they lack in experience just because they have been doing what they were doing in one job without exposing themselves to the world’s new requirement of learn , re learn and unlearn. Chances of this group, getting a job will be negative. World has thrown different types of challenges to people, community, jobs, businesses , those people used to be applauded for remaining On one job for life ,same group of people are looked differently by corporate firms as redundant due to technology. So should people keep changing jobs after few years to just get on to learn, re learn and unlearn or continue waiting for their existing companies to face challenges and go off from the market? Only time and technology will determine what is store for human race next.
According to some of the studies, its shown the longer the delay in adopting technology for any given nation, the lower the per capita income of that nation. It shows extreme reliance on Technology but can all of us adopt to the technology at the same rate as its been introduced to us? Can our children or upcoming next generations adopt technology at same scale? Or future is Either Technology or nothing, in Short Job or Jobless there is no in between option?
Stephen Goldsmith, director of the Innovations in Government Program and Data-Smart City Solutions at the John F. Kennedy School of Government at Harvard University, said that in some areas, technological advancements have exceeded expectations made in 2000.
The Internet also has exploded beyond expectations. From 2000 to 2010, the number of Internet users increased 500 percent, from 361 million worldwide to almost 2 billion. Now, close to 4 billion people throughout the world use the Internet. People go online for everything from buying groceries and clothes to finding a date. They can register their cars online, earn a college degree, shop for houses and apply for a mortgage but again same question is arising , Can each one of us at the same scale use or advance their skill to use technology or we are leaving our senior generations behind and making them cripple in today’s society? Or How about Mid age people who are in their 50s and soon going to take over senior society , Can they get the job and advance their skill to meet technology demands or learn, unlearn and re learn or Not only pandemic but even Technology is going to make human redundant before their actual retirement and their knowledge, skill obsolete. There should be a way forward to achieve balance, absolute reliance on Technology is not only cyber threat to governments but in long term, Unemployment, Creating Jobs or paying minimum wage to unemployed mass will be a huge worry. At the end of the day, humans need basic and then luxury. Technology can bring ease of doing business, connecting businesses and out flows, connecting Wholesalers to end users but in between many jobs, heads will be slashed down and impact will be dire. Therefore Humans have to get themselves prepared to learn, unlearn and re learn to meet today’s technology requirement or prepare themselves for early retirement.
Article | December 17, 2020
In this article, we’ll talk about different roles in a data team and discuss their responsibilities.
In particular, we will cover:
The types of roles in a data team;
The responsibilities of each role;
The skills and knowledge each role needs to have.
This is not a comprehensive list and the majority of what you will read in this article is my opinion, which comes out of my experience from working as a data scientist.
You can interpret the following information as “the description of data roles from the perspective of a data scientist”. For example, my views on the role of a data engineer may be a bit simplified because I don’t see all the complexities of their work firsthand. I do hope you will find this information useful nonetheless.
Roles in a Team
A typical data team consists of the following roles:
Machine learning engineers, and
Site reliability engineers / MLOps engineers.
All these people work to create a data product.
To explain the core responsibilities of each role, we will use a case scenario:
Suppose we work at an online classifieds company. It’s a platform where users can go to sell things they don’t need (like OLX, where I work). If a user has an iPhone they want to sell — they go to this website, create a listing and sell their phone.
On this platform, sellers sometimes have problems with identifying the correct category for the items they are selling. To help them, we want to build a service that suggests the best category. To sell their iPhone, the user creates a listing and the site needs to automatically understand that this iPhone has to go in the “mobile phones” category.
Let’s start with the first role: product manager.
A product manager is someone responsible for developing products. Their goal is to make sure that the team is building the right thing. They are typically less technical than the rest of the team: they don’t focus on the implementation aspects of a problem, but rather the problem itself.
Product managers need to ensure that the product is actually used by the end-users. This is a common problem: in many companies, engineers create something that doesn’t solve real problems. Therefore, the product manager is somebody who speaks to the team on behalf of the users.
The primary skills a PM needs to have are communication skills. For data scientists, communication is a soft skill, but for a product manager — it’s a hard skill. They have to have it to perform their work.
Product managers also do a lot of planning: they need to understand the problem, come up with a solution, and make sure the solution is implemented in a timely manner. To accomplish this, PMs need to know what’s important and plan the work accordingly.
When somebody has a problem, they approach the PM with it. Then the task of the PM is to figure out if users actually need this feature, how important this feature is, and if the team has the capacity to implement it.
Let’s come back to our example. Suppose somebody comes to the PM and says:
“We want to build a feature to automatically suggest the category for a listing. Somebody’s selling an iPhone, and we want to create a service that predicts that the item goes in the mobile phones category.”
Product managers need to answer these questions:
“Is this feature that important to the user?”
“Is it an important problem to solve in the product at all?”
To answer these questions, PMs ask data analysts to help them figure out what to do next.
Data analysts know how to analyze the data available in the company. They discover insights in the data and then explain their findings to others.
So, analysts need to know:
What kind of data the company has;
How to get the data;
How to interpret the results;
How to explain their findings to colleagues and management.
Data analysts are also often responsible for defining key metrics and building different dashboards. This includes things like showing the company’s profits, displaying the number of listings, or how many contacts buyers made with sellers. Thus, data analysts should know how to calculate all the important business metrics, and how to present them in a way that is understandable to others.
When it comes to skills, data analysts should know:
SQL — this is the main tool that they work with;
Programming languages such as Python or R;
Tableau or similar tools for building dashboards;
Basics of statistics;
How to run experiments;
A bit of machine learning, such as regression analysis, and time series modeling.
For our example, product managers turn to data analysts to help them quantify the extent of the problem. Together with the PM, the data analyst tries to answer questions like:
“How many users are affected by this problem?”
“How many users don’t finish creating their listing because of this problem?”
“How many listings are there on the platform that don’t have the right category selected?”
After the analyst gets the data, analyzes it and answers these questions, they may conclude: “Yes, this is actually a problem”. Then the PM and the team discuss the repost and agree: “Indeed, this problem is actually worth solving”.
Now the data team will go ahead and start solving this problem.
After the model for the service is created, it’s necessary to understand if the service is effective: whether this model helps people and solves the problem. For that, data analysts usually run experiments — usually, A/B tests.
When running an experiment, we can see if more users successfully finish posting an item for sale or if there are fewer ads that end up in the wrong category.
The roles of a data scientist and data analyst are pretty similar. In some companies, it’s the same person who does both jobs. However, data scientists typically focus more on predicting rather than explaining.
A data analyst fetches the data, looks at it, explains what’s going on to the team, and gives some recommendations on what to do about it. A data scientist, on the other hand, focuses more on creating machine learning services. For example, one of the questions that a data scientist would want to answer is “How can we use this data to build a machine learning model for predicting something?”
In other words, data scientists incorporate the data into the product. Their focus is more on engineering than analysis. Data scientists work more closely with engineers on integrating data solutions into the product.
The skills of data scientists include:
Machine learning — the main tool for building predictive services;
Python — the primary programming language;
SQL — necessary to fetch the data for training their models;
Flask, Docker, and similar — to create simple web services for serving the models.
For our example, the data scientists are the people who develop the model used for predicting the category. Once they have a model, they can develop a simple web service for hosting this model.
Data engineers do all the heavy lifting when it comes to data. A lot of work needs to happen before data analysts can go to a database, fetch the data, perform their analysis, and come up with a report. This is precisely the focus of data engineers — they make sure this is possible. Their responsibility is to prepare all the necessary data in a form that is consumable for their colleagues.
To accomplish this, data engineers create “a data lake”. All the data that users generate needs to be captured properly and saved in a separate database. This way, analysts can run their analysis, and data scientists can use this data for training models.
Another thing data engineers often need to do, especially at larger companies, is to ensure that the people who look at the data have the necessary clearance to do so. Some user data is sensitive and people can’t just go looking around at personal information (such as emails or phone numbers) unless they have a really good reason to do so. Therefore, data engineers need to set up a system that doesn’t let people just access all the data at once.
The skills needed for data engineers usually include:
AWS or Google Cloud — popular cloud providers;
Kubernetes and Terraform — infrastructure tools;
Kafka or RabbitMQ — tools for capturing and processing the data;
Databases — to save the data in such a way that it’s accessible for data analysts;
Airflow or Luigi — data orchestration tools for building complex data pipelines.
In our example, a data engineer prepares all the required data. First, they make sure the analyst has the data to perform the analysis. Then they also work with the data scientist to prepare the information that we’ll need for training the model. That includes the title of the listing, its description, the category, and so on.
A data engineer isn’t the only type of engineer that a data team has. There are also machine learning engineers.
Machine Learning Engineer
Machine learning engineers take whatever data scientists build and help them scale it up. They also ensure that the service is maintainable and that the team follows the best engineering practices. Their focus is more on engineering than on modeling.
The skills ML engineers have are similar to that of data engineers:
AWS or Google Cloud;
Infrastructure tools like Kubernetes and Terraform;
Python and other programming languages;
Flask, Docker, and other tools for creating web services.
Additionally, ML engineers work closely with more “traditional” engineers, like backend engineers, frontend engineers, or mobile engineers, to ensure that the services from the data team are included in the final product.
For our example, ML engineers work together with data scientists on productionizing the category suggestion services. They make sure it’s stable once it’s rolled out to all the users. They must also ensure that it’s maintainable and it’s possible to make changes to the service in the future.
There’s another kind of engineer that can be pretty important in a data team — site reliability engineers.
DevOps / Site Reliability Engineer
The role of SREs is similar to the ML engineer, but the focus is more on the availability and reliability of the services.
SREs aren’t strictly limited to working with data. Their role is more general: they tend to focus less on business logic and more on infrastructure, which includes things like networking and provisioning infrastructure.
Therefore, SREs look after the servers where the services are running and take care of collecting all the operational metrics like CPU usage, how many requests per second there are, the services’ processes, and so on.
As the name suggests, site reliability engineers have to make sure that everything runs reliably. They set up alerts and are constantly on call to make sure that the services are up and running without any interruptions. If something breaks, SREs quickly diagnose the problem and fix it, or involve an engineer to help find the solution.
The skills needed for site reliability engineers:
Cloud infrastructure tools;
Programming languages like Python,
Best DevOps practices like automation, CI/CD, and the like.
Of course, ML engineers and data engineers should also know these best practices, but the focus of DevOps engineers/SREs is to establish them and make sure that they are followed.
There is a special type of DevOps engineer, called “MLOps engineer”.
An MLOps engineer is a DevOps engineer who also knows the basics of machine learning. Similar to an SRE, the responsibility of an MLOps Engineer is to make sure that the services, developed by data scientists, ML engineers, and data engineers, are up and running all the time.
MLOps engineers know the lifecycle of a machine learning model: the training phase, serving phase, and so on.
Despite having this knowledge, MLOps Engineers are still focused more on operational support than on anything else. This means that they need to know and follow all the DevOps practices and make sure that the rest of the team is following them as well. They accomplish this by setting up things like continuous retraining, and CI/CD pipelines.
Even though everyone in the team has a different focus, they all work together on achieving the same goal: solve the problems of the users.
To summarize, the roles in the data team and their responsibilities are:
Product managers — make sure that the team is building the right thing, act as a gateway to all the requests and speak on behalf of the users.
Data analysts — analyze data, define key metrics, and create dashboards.
Data scientists — build models and incorporate them into the product.
Data engineers — prepare the data for analysts and data scientists.
ML engineers — productionize machine learning services and establish the best engineering practices.
Site reliability engineers — focus on availability, reliability, enforce the best DevOps practices.
This list is not comprehensive, but it should be a good starting point if you are just getting into the industry, or if you just want to know how the lines between different roles are defined in the industry.
Article | December 23, 2020
Nowadays, everyone with some technical expertise and a data science bootcamp under their belt calls themselves a data scientist. Also, most managers don't know enough about the field to distinguish an actual data scientist from a make-believe one someone who calls themselves a data science professional today but may work as a cab driver next year. As data science is a very responsible field dealing with complex problems that require serious attention and work, the data scientist role has never been more significant. So, perhaps instead of arguing about which programming language or which all-in-one solution is the best one, we should focus on something more fundamental. More specifically, the thinking process of a data scientist.
The challenges of the Data Science professional
Any data science professional, regardless of his specialization, faces certain challenges in his day-to-day work. The most important of these involves decisions regarding how he goes about his work. He may have planned to use a particular model for his predictions or that model may not yield adequate performance (e.g., not high enough accuracy or too high computational cost, among other issues). What should he do then? Also, it could be that the data doesn't have a strong enough signal, and last time I checked, there wasn't a fool-proof method on any data science programming library that provided a clear-cut view on this matter. These are calls that the data scientist has to make and shoulder all the responsibility that goes with them.
Why Data Science automation often fails
Then there is the matter of automation of data science tasks. Although the idea sounds promising, it's probably the most challenging task in a data science pipeline. It's not unfeasible, but it takes a lot of work and a lot of expertise that's usually impossible to find in a single data scientist. Often, you need to combine the work of data engineers, software developers, data scientists, and even data modelers. Since most organizations don't have all that expertise or don't know how to manage it effectively, automation doesn't happen as they envision, resulting in a large part of the data science pipeline needing to be done manually.
The Data Science mindset overall
The data science mindset is the thinking process of the data scientist, the operating system of her mind. Without it, she can't do her work properly, in the large variety of circumstances she may find herself in. It's her mindset that organizes her know-how and helps her find solutions to the complex problems she encounters, whether it is wrangling data, building and testing a model or deploying the model on the cloud. This mindset is her strategy potential, the think tank within, which enables her to make the tough calls she often needs to make for the data science projects to move forward.
Specific aspects of the Data Science mindset
Of course, the data science mindset is more than a general thing. It involves specific components, such as specialized know-how, tools that are compatible with each other and relevant to the task at hand, a deep understanding of the methodologies used in data science work, problem-solving skills, and most importantly, communication abilities. The latter involves both the data scientist expressing himself clearly and also him understanding what the stakeholders need and expect of him. Naturally, the data science mindset also includes organizational skills (project management), the ability to work well with other professionals (even those not directly related to data science), and the ability to come up with creative approaches to the problem at hand.
The Data Science process
The data science process/pipeline is a distillation of data science work in a comprehensible manner. It's particularly useful for understanding the various stages of a data science project and help plan accordingly. You can view one version of it in Fig. 1 below. If the data science mindset is one's ability to navigate the data science landscape, the data science process is a map of that landscape. It's not 100% accurate but good enough to help you gain perspective if you feel overwhelmed or need to get a better grip on the bigger picture.
Learning more about the topic
Naturally, it's impossible to exhaust this topic in a single article (or even a series of articles). The material I've gathered on it can fill a book! If you are interested in such a book, feel free to check out the one I put together a few years back; it's called Data Science Mindset, Methodologies, and Misconceptions and it's geared both towards data scientist, data science learners, and people involved in data science work in some way (e.g. project leaders or data analysts). Check it out when you have a moment. Cheers!
Article | April 6, 2020
Today when we look around, we see how technology has revolutionized our world. It has created amazing elements and resources, putting useful intelligence at our fingertips. With all of these revolutions, technology has also made our lives easier, faster, digital and fun. Perhaps at a point when we are talking about technology, Machine learning and artificial intelligence are increasingly popular buzzwords used in modern terms.Machine Learning has proven to be one of the game changer technological advancements of the past decade. In the increasingly competitive corporate world, Machine learning is enabling companies to fast-track digital transformation and move into an age of automation. Some might even argue that AI/ML is required to stay relevant in some verticals, such as digital payments and fraud detection in banking or product recommendations.To understand what machine learning is, it is important to know the concepts of artificial intelligence (AI). It is defined as a program that exhibits cognitive ability similar to that of a human being. Making computers think like humans and solve problems the way we do is one of the main tenets of artificial intelligence.