Article | May 12, 2021
Decision-makers at consumer brands are finally realizing the full transformative potential of external data - but they’re also realizing how difficult it is to source. Forrester reports that 87% of decision-makers in data and analytics have implemented or are planning initiatives to source more external data. And those initiatives are growing outside of the IT team; 29% of those surveyed say that IT has primary ownership of data sourcing, down from 37% in 2016. To support these projects, organizations are increasingly turning to a new specialist: the data hunter, who identifies and vets external data sources. It’s a lot of work to build external data-focused teams, and many leaders are realizing that external data is difficult to scale as the source list grows. Perhaps that’s why 66% of those decision-makers surveyed by Forrester report that they’re using or planning to use external service providers for data, analytics, and insights.
BIG DATA MANAGEMENT
Article | May 12, 2021
In this article, we’ll talk about different roles in a data team and discuss their responsibilities.
In particular, we will cover:
The types of roles in a data team;
The responsibilities of each role;
The skills and knowledge each role needs to have.
This is not a comprehensive list and the majority of what you will read in this article is my opinion, which comes out of my experience from working as a data scientist.
You can interpret the following information as “the description of data roles from the perspective of a data scientist”. For example, my views on the role of a data engineer may be a bit simplified because I don’t see all the complexities of their work firsthand. I do hope you will find this information useful nonetheless.
Roles in a Team
A typical data team consists of the following roles:
Machine learning engineers, and
Site reliability engineers / MLOps engineers.
All these people work to create a data product.
To explain the core responsibilities of each role, we will use a case scenario:
Suppose we work at an online classifieds company. It’s a platform where users can go to sell things they don’t need (like OLX, where I work). If a user has an iPhone they want to sell — they go to this website, create a listing and sell their phone.
On this platform, sellers sometimes have problems with identifying the correct category for the items they are selling. To help them, we want to build a service that suggests the best category. To sell their iPhone, the user creates a listing and the site needs to automatically understand that this iPhone has to go in the “mobile phones” category.
Let’s start with the first role: product manager.
A product manager is someone responsible for developing products. Their goal is to make sure that the team is building the right thing. They are typically less technical than the rest of the team: they don’t focus on the implementation aspects of a problem, but rather the problem itself.
Product managers need to ensure that the product is actually used by the end-users. This is a common problem: in many companies, engineers create something that doesn’t solve real problems. Therefore, the product manager is somebody who speaks to the team on behalf of the users.
The primary skills a PM needs to have are communication skills. For data scientists, communication is a soft skill, but for a product manager — it’s a hard skill. They have to have it to perform their work.
Product managers also do a lot of planning: they need to understand the problem, come up with a solution, and make sure the solution is implemented in a timely manner. To accomplish this, PMs need to know what’s important and plan the work accordingly.
When somebody has a problem, they approach the PM with it. Then the task of the PM is to figure out if users actually need this feature, how important this feature is, and if the team has the capacity to implement it.
Let’s come back to our example. Suppose somebody comes to the PM and says:
“We want to build a feature to automatically suggest the category for a listing. Somebody’s selling an iPhone, and we want to create a service that predicts that the item goes in the mobile phones category.”
Product managers need to answer these questions:
“Is this feature that important to the user?”
“Is it an important problem to solve in the product at all?”
To answer these questions, PMs ask data analysts to help them figure out what to do next.
Data analysts know how to analyze the data available in the company. They discover insights in the data and then explain their findings to others.
So, analysts need to know:
What kind of data the company has;
How to get the data;
How to interpret the results;
How to explain their findings to colleagues and management.
Data analysts are also often responsible for defining key metrics and building different dashboards. This includes things like showing the company’s profits, displaying the number of listings, or how many contacts buyers made with sellers. Thus, data analysts should know how to calculate all the important business metrics, and how to present them in a way that is understandable to others.
When it comes to skills, data analysts should know:
SQL — this is the main tool that they work with;
Programming languages such as Python or R;
Tableau or similar tools for building dashboards;
Basics of statistics;
How to run experiments;
A bit of machine learning, such as regression analysis, and time series modeling.
For our example, product managers turn to data analysts to help them quantify the extent of the problem. Together with the PM, the data analyst tries to answer questions like:
“How many users are affected by this problem?”
“How many users don’t finish creating their listing because of this problem?”
“How many listings are there on the platform that don’t have the right category selected?”
After the analyst gets the data, analyzes it and answers these questions, they may conclude: “Yes, this is actually a problem”. Then the PM and the team discuss the repost and agree: “Indeed, this problem is actually worth solving”.
Now the data team will go ahead and start solving this problem.
After the model for the service is created, it’s necessary to understand if the service is effective: whether this model helps people and solves the problem. For that, data analysts usually run experiments — usually, A/B tests.
When running an experiment, we can see if more users successfully finish posting an item for sale or if there are fewer ads that end up in the wrong category.
The roles of a data scientist and data analyst are pretty similar. In some companies, it’s the same person who does both jobs. However, data scientists typically focus more on predicting rather than explaining.
A data analyst fetches the data, looks at it, explains what’s going on to the team, and gives some recommendations on what to do about it. A data scientist, on the other hand, focuses more on creating machine learning services. For example, one of the questions that a data scientist would want to answer is “How can we use this data to build a machine learning model for predicting something?”
In other words, data scientists incorporate the data into the product. Their focus is more on engineering than analysis. Data scientists work more closely with engineers on integrating data solutions into the product.
The skills of data scientists include:
Machine learning — the main tool for building predictive services;
Python — the primary programming language;
SQL — necessary to fetch the data for training their models;
Flask, Docker, and similar — to create simple web services for serving the models.
For our example, the data scientists are the people who develop the model used for predicting the category. Once they have a model, they can develop a simple web service for hosting this model.
Data engineers do all the heavy lifting when it comes to data. A lot of work needs to happen before data analysts can go to a database, fetch the data, perform their analysis, and come up with a report. This is precisely the focus of data engineers — they make sure this is possible. Their responsibility is to prepare all the necessary data in a form that is consumable for their colleagues.
To accomplish this, data engineers create “a data lake”. All the data that users generate needs to be captured properly and saved in a separate database. This way, analysts can run their analysis, and data scientists can use this data for training models.
Another thing data engineers often need to do, especially at larger companies, is to ensure that the people who look at the data have the necessary clearance to do so. Some user data is sensitive and people can’t just go looking around at personal information (such as emails or phone numbers) unless they have a really good reason to do so. Therefore, data engineers need to set up a system that doesn’t let people just access all the data at once.
The skills needed for data engineers usually include:
AWS or Google Cloud — popular cloud providers;
Kubernetes and Terraform — infrastructure tools;
Kafka or RabbitMQ — tools for capturing and processing the data;
Databases — to save the data in such a way that it’s accessible for data analysts;
Airflow or Luigi — data orchestration tools for building complex data pipelines.
In our example, a data engineer prepares all the required data. First, they make sure the analyst has the data to perform the analysis. Then they also work with the data scientist to prepare the information that we’ll need for training the model. That includes the title of the listing, its description, the category, and so on.
A data engineer isn’t the only type of engineer that a data team has. There are also machine learning engineers.
Machine Learning Engineer
Machine learning engineers take whatever data scientists build and help them scale it up. They also ensure that the service is maintainable and that the team follows the best engineering practices. Their focus is more on engineering than on modeling.
The skills ML engineers have are similar to that of data engineers:
AWS or Google Cloud;
Infrastructure tools like Kubernetes and Terraform;
Python and other programming languages;
Flask, Docker, and other tools for creating web services.
Additionally, ML engineers work closely with more “traditional” engineers, like backend engineers, frontend engineers, or mobile engineers, to ensure that the services from the data team are included in the final product.
For our example, ML engineers work together with data scientists on productionizing the category suggestion services. They make sure it’s stable once it’s rolled out to all the users. They must also ensure that it’s maintainable and it’s possible to make changes to the service in the future.
There’s another kind of engineer that can be pretty important in a data team — site reliability engineers.
DevOps / Site Reliability Engineer
The role of SREs is similar to the ML engineer, but the focus is more on the availability and reliability of the services.
SREs aren’t strictly limited to working with data. Their role is more general: they tend to focus less on business logic and more on infrastructure, which includes things like networking and provisioning infrastructure.
Therefore, SREs look after the servers where the services are running and take care of collecting all the operational metrics like CPU usage, how many requests per second there are, the services’ processes, and so on.
As the name suggests, site reliability engineers have to make sure that everything runs reliably. They set up alerts and are constantly on call to make sure that the services are up and running without any interruptions. If something breaks, SREs quickly diagnose the problem and fix it, or involve an engineer to help find the solution.
The skills needed for site reliability engineers:
Cloud infrastructure tools;
Programming languages like Python,
Best DevOps practices like automation, CI/CD, and the like.
Of course, ML engineers and data engineers should also know these best practices, but the focus of DevOps engineers/SREs is to establish them and make sure that they are followed.
There is a special type of DevOps engineer, called “MLOps engineer”.
An MLOps engineer is a DevOps engineer who also knows the basics of machine learning. Similar to an SRE, the responsibility of an MLOps Engineer is to make sure that the services, developed by data scientists, ML engineers, and data engineers, are up and running all the time.
MLOps engineers know the lifecycle of a machine learning model: the training phase, serving phase, and so on.
Despite having this knowledge, MLOps Engineers are still focused more on operational support than on anything else. This means that they need to know and follow all the DevOps practices and make sure that the rest of the team is following them as well. They accomplish this by setting up things like continuous retraining, and CI/CD pipelines.
Even though everyone in the team has a different focus, they all work together on achieving the same goal: solve the problems of the users.
To summarize, the roles in the data team and their responsibilities are:
Product managers — make sure that the team is building the right thing, act as a gateway to all the requests and speak on behalf of the users.
Data analysts — analyze data, define key metrics, and create dashboards.
Data scientists — build models and incorporate them into the product.
Data engineers — prepare the data for analysts and data scientists.
ML engineers — productionize machine learning services and establish the best engineering practices.
Site reliability engineers — focus on availability, reliability, enforce the best DevOps practices.
This list is not comprehensive, but it should be a good starting point if you are just getting into the industry, or if you just want to know how the lines between different roles are defined in the industry.
Article | May 12, 2021
According to Google trends, predictive data analytics has gained a significant amount of popularity over the last few years. Many businesses have implemented predictive analytics applications to increase their business reach, gain new customers, forecast sales, and more.
Predictive Analytics is a type of data analytics technology that makes predictions with the help of data sets, statistical modeling, and machine learning. Predictive analytics uses historical data. This historical data is fed into a mathematical model that recognizes patterns and trends that are then applied to current data to forecast trends, practices, and behaviors from milliseconds to days and even years.
Based on the parameters supplied to them, organizations find patterns within that data to detect risks, opportunities, forecast conditions, and events that would occur at a particular time. At its heart, the use of predictive analytics answers a simple question, “What would happen based on my current data and what can be done to change the outcome.”
In the current times, businesses have multiple products offerings at their disposal to choose from vendors of big data predictive analytics in different industries. They can help these businesses leverage historical data discovering complex data correlation, recognizing patterns, and forecasting.
Organizations are turning to predictive analytics to increase their bottom line and gain advantages against their competition. Some of those reasons are listed below:
• With the growing amount and types of data, there is more interest in utilizing it to produce valuable insights
• Better computers
• An abundance of easy to use software
• Need of competitive differentiation due to tougher
As more and more easy-to-use software have been introduced, businesses no longer need statisticians and mathematicians for predictive analytics and forecasting.
Benefits of Predictive Analytics
Competitive edge over other businesses
The most common reason why multiple companies picked up predictive analytics was to gain an advantage over their competitors. Customer trends and buying patterns keep changing from time to time. The ones who can identify it first will go ahead in the game. Embracing predictive analytics is how you will stay ahead of your competition. Predictive analytics will aid in qualified lead generation and give you an insight into the present and potential customers.
Businesses opt for predictive analytics to predict customer behavior, preferences, and responses. Using this information, they attract their target audience and entice them into becoming loyal customers. Predictive analytics gives valuable information about your customers such as which of them are likely to lapse, how to retain them, whether you should market directly at them, etc. The more you know about them, the stronger your marketing will become. Your business will become the leader in predicting your customer’s exact needs.
Retaining existing customers is almost five times more difficult than acquiring new ones. The most successful company is the one that invests money in retaining those customers as much as acquiring new ones.
Predictive analytics helps in directing marketing strategies towards your existing customers and get them to return frequently. The analytics tool will make sure your marketing strategy caters to the diverse requirements of your customers.
Earlier marketing strategies revolved around the ‘one size fits all’ approach, but gone are those days. If you want to retain and acquire new customers, you have to create personalized marketing campaigns to attract customers.
Predictive analytics and data management help you to get new information about customer expectations, previous purchases, buying behaviors, and patterns. Using this data, you can create these personalized marketing strategies that will help keep up the engagement and acquire new customers.
Application of Predictive Analytics
Customer targeting divides the customer base into different demographic groups according to age, gender, interests, buying, and spending habits. It helps companies to create tailored marketing communications specifically to the customers who are likely to buy their products. Traditional techniques do not even come close to identifying potential customers as well as predictive analytics does.
The major constituents that create these customer groups are:
• Socio-demographic factors: age, gender, education, and marital status
• Engagement factors: recent interaction, frequency, spending habits, etc.
• Past campaign response: contact response, type, day, month, etc.
The customer-specific targeting for the company is highly advantageous. They can:
• Better communicate with the customers
• Save money on marketing
• Increase profits
Customer churn prevention
Customer churn prevention creates major hurdles in a company’s growth. Although it has been proven that retaining customers is cheaper than gaining new ones, it can become a problem. Detecting a client’s dissatisfaction is not an easy task as they can abruptly stop using your services without any warning.
Here, churn prevention comes into the picture. Churn prevention aims to predict who will end their relationship with the company, when, and why. The existing data sets can help develop predictive models so companies can be proactive to prevent the fallout.
Factors that can influence the churn are as follows:
• Customer variables
• Service use
• Competitor variables
Using these variables, companies can then take necessary steps to avoid the churn by offering customers personalized services or products.
Risk assessment and management processes in many companies are antiquated. Even though customer information is abundantly available for evaluation, it is still antiquated.
With advanced analytics, this data can be quickly and accurately analyzed while maintaining customer privacy and boundaries. Risk assessment thus allows companies to analyze problems with any business. Predictive analytics can approximate with certainty which operations are profitable and which are not.
Risk assessment analyzes the following data types:
• Socio-demographic factors
• Product details
• Customer behavior
• Risk metrics
Evaluating the previous history, seasonality, and market-affecting events make revenue predicting vital for a company’s planning and result in a company’s demand for a product or a service. This can be applied to short-term, medium-term, and long-term forecasting.
Predictive models help in anticipating a customer’s reaction to the factors that affect sales.
Following factors can be used in sales forecasting:
• Calendar data
• Weather data
• Company data
• Social data
• Demand data
Sales forecasting allows revenue prediction and optimal resource allocation.
Healthcare organizations have begun to use predictive analytics as this technology is helping them save money. They are using predictive analytics in several different ways. With the help of this technology, based on past trends they can now allocate facility resources, optimize staff schedules, identify patients at risk, adding intelligence to pharmaceutical and supply acquisition management.
Using predictive analytics in the health domain has also helped in preventing cases and risks of developing health complications like diabetes, asthma, and other life-threatening problems. The application of predictive analytics in health care can lead to making better clinical decisions for patients.
Predictive analytics is being used across different industries and is good way to advance your company’s growth and forecast future events to act accordingly. It has gained support from many different organizations at a global scale and will continue to grow rapidly.
Frequently Asked Questions
What is predictive analytics?
Predictive analytics uses historical data to predict future events. The historical data is used to build mathematical model that captures essential trends. That predictive model is based on current data that predicts what will happen next or suggest steps to take for optimal outcomes.
How to do predictive analytics?
• Define business objectives
• Collect relevant data available from resources
• Improve on collected data by data cleaning methods
• Choose a model or build your own to test data
• Evaluate and validate the predictive model to ensure
How does predictive analytics work for business?
Predictive analytics helps businesses attract, retain, and grow their profitable customers. It also helps them in improving their operations.
What tools are used for predictive analytics?
Some tools used for predictive analytics are:
• SAS Advanced Analytics
• Oracle DataScience
• IBM SPSS Statistics
• SAP Predictive Analytics
• Q Research
"name": "What is predictive analytics?",
"text": "Predictive analytics uses historical data to predict future events. The historical data is used to build a mathematical model that captures essential trends. That predictive model is based on current data that predicts what will happen next or suggest steps to take for optimal outcomes."
"name": "How to do predictive analytics?",
"text": "Define business objectives
Collect relevant data available from resources
Improve on collected data by data cleaning methods
Choose a model or build your own to test data
Evaluate and validate the predictive model to ensure "
"name": "How does predictive analytics work for business?",
"text": "Predictive analytics helps businesses attract, retain, and grow their profitable customers. It also helps them in improving their operations."
"name": "What tools are used for predictive analytics?",
"text": "Some tools used for predictive analytics are:
SAS Advanced Analytics
IBM SPSS Statistics
SAP Predictive Analytics
Article | May 12, 2021
DataOps helps reduce the time data scientists spend preparing data for use in applications. Such tasks consume roughly 80% of their time now.We’re still hopeful that the digital transformation will provide the insights businesses need from big data. As a data scientist, you’re probably aware of the growing pressure from companies to extract meaningful insights from data and find the stories needed for impact.No matter how in-demand data science is in the employment numbers, equal pressure is rising for data scientists to deliver business value and no wonder. We’re approaching the age where data science and AI draw a line in the sand for which companies remain competitive and which ones collapse.One answer to this pressure is the rise of DataOps. Let’s take a look at what it is and how it could provide a path for data scientists to give businesses what they’ve been after.