Article | October 27, 2020
Data Platforms and frameworks have been constantly evolving. At some point of time; we are excited by Hadoop (well for almost 10 years); followed by Snowflake or as I say Snowflake Blizzard (who managed to launch biggest IPO win historically) and the Google (Google solves problems and serves use cases in a way that few companies can match).
The end of the data warehouse
Once upon a time, life was simple; or at least, the basic approach to Business Intelligence was fairly easy to describe… A process of collecting information from systems, building a repository of consistent data, and bolting on one or more reporting and visualisation tools which presented information to users. Data used to be managed in expensive, slow, inaccessible SQL data warehouses. SQL systems were notorious for their lack of scalability. Their demise is coming from a few technological advances. One of these is the ubiquitous, and growing, Hadoop.
On April 1, 2006, Apache Hadoop was unleashed upon Silicon Valley. Inspired by Google, Hadoop’s primary purpose was to improve the flexibility and scalability of data processing by splitting the process into smaller functions that run on commodity hardware.
Hadoop’s intent was to replace enterprise data warehouses based on SQL. Unfortunately, a technology used by Google may not be the best solution for everyone else. It’s not that others are incompetent: Google solves problems and serves use cases in a way that few companies can match. Google has been running massive-scale applications such as its eponymous search engine, YouTube and the Ads platform. The technologies and infrastructure that make the geographically distributed offerings perform at scale are what make various components of Google Cloud Platform enterprise ready and well-featured. Google has shown leadership in developing innovations that have been made available to the open-source community and are being used extensively by other public cloud vendors and Gartner clients. Examples of these include the Kubernetes container management framework, TensorFlow machine learning platform and the Apache Beam data processing programming model. GCP also uses open-source offerings in its cloud while treating third-party data and analytics providers as first-class citizens on its cloud and providing unified billing for its customers. The examples of the latter include DataStax, Redis Labs, InfluxData, MongoDB, Elastic, Neo4j and Confluent.
Silicon Valley tried to make Hadoop work. The technology was extremely complicated and nearly impossible to use efficiently. Hadoop’s lack of speed was compounded by its focus on unstructured data — you had to be a “flip-flop wearing” data scientist to truly make use of it.
Unstructured datasets are very difficult to query and analyze without deep knowledge of computer science. At one point, Gartner estimated that 70% of Hadoop deployments would not achieve the goal of cost savings and revenue growth, mainly due to insufficient skills and technical integration difficulties. And seventy percent seems like an understatement.
Data storage through the years: from GFS to Snowflake or Snowflake blizzard
Developing in parallel with Hadoop’s journey was that of Marcin Zukowski — co-founder and CEO of Vectorwise. Marcin took the data warehouse in another direction, to the world of advanced vector processing. Despite being almost unheard of among the general public, Snowflake was actually founded back in 2012. Firstly, Snowflake is not a consumer tech firm like Netflix or Uber. It's business-to-business only, which may explain its high valuation – enterprise companies are often seen as a more "stable" investment. In short, Snowflake helps businesses manage data that's stored on the cloud. The firm's motto is "mobilising the world's data", because it allows big companies to make better use of their vast data stores.
Marcin and his teammates rethought the data warehouse by leveraging the elasticity of the public cloud in an unexpected way: separating storage and compute. Their message was this: don’t pay for a data warehouse you don’t need. Only pay for the storage you need, and add capacity as you go. This is considered one of Snowflake’s key innovations: separating storage (where the data is held) from computing (the act of querying). By offering this service before Google, Amazon, and Microsoft had equivalent products of their own, Snowflake was able to attract customers, and build market share in the data warehousing space.
Naming the company after a discredited database concept was very brave. For those of us not in the details of the Snowflake schema, it is a logical arrangement of tables in a multidimensional database such that the entity-relationship diagram resembles a snowflake shape. … When it is completely normalized along all the dimension tables, the resultant structure resembles a snowflake with the fact table in the middle. Needless to say, the “snowflake” schema is as far from Hadoop’s design philosophy as technically possible.
While Silicon Valley was headed toward a dead end, Snowflake captured an entire cloud data market.
BIG DATA MANAGEMENT
Article | October 27, 2020
In this article, we’ll talk about different roles in a data team and discuss their responsibilities.
In particular, we will cover:
The types of roles in a data team;
The responsibilities of each role;
The skills and knowledge each role needs to have.
This is not a comprehensive list and the majority of what you will read in this article is my opinion, which comes out of my experience from working as a data scientist.
You can interpret the following information as “the description of data roles from the perspective of a data scientist”. For example, my views on the role of a data engineer may be a bit simplified because I don’t see all the complexities of their work firsthand. I do hope you will find this information useful nonetheless.
Roles in a Team
A typical data team consists of the following roles:
Machine learning engineers, and
Site reliability engineers / MLOps engineers.
All these people work to create a data product.
To explain the core responsibilities of each role, we will use a case scenario:
Suppose we work at an online classifieds company. It’s a platform where users can go to sell things they don’t need (like OLX, where I work). If a user has an iPhone they want to sell — they go to this website, create a listing and sell their phone.
On this platform, sellers sometimes have problems with identifying the correct category for the items they are selling. To help them, we want to build a service that suggests the best category. To sell their iPhone, the user creates a listing and the site needs to automatically understand that this iPhone has to go in the “mobile phones” category.
Let’s start with the first role: product manager.
A product manager is someone responsible for developing products. Their goal is to make sure that the team is building the right thing. They are typically less technical than the rest of the team: they don’t focus on the implementation aspects of a problem, but rather the problem itself.
Product managers need to ensure that the product is actually used by the end-users. This is a common problem: in many companies, engineers create something that doesn’t solve real problems. Therefore, the product manager is somebody who speaks to the team on behalf of the users.
The primary skills a PM needs to have are communication skills. For data scientists, communication is a soft skill, but for a product manager — it’s a hard skill. They have to have it to perform their work.
Product managers also do a lot of planning: they need to understand the problem, come up with a solution, and make sure the solution is implemented in a timely manner. To accomplish this, PMs need to know what’s important and plan the work accordingly.
When somebody has a problem, they approach the PM with it. Then the task of the PM is to figure out if users actually need this feature, how important this feature is, and if the team has the capacity to implement it.
Let’s come back to our example. Suppose somebody comes to the PM and says:
“We want to build a feature to automatically suggest the category for a listing. Somebody’s selling an iPhone, and we want to create a service that predicts that the item goes in the mobile phones category.”
Product managers need to answer these questions:
“Is this feature that important to the user?”
“Is it an important problem to solve in the product at all?”
To answer these questions, PMs ask data analysts to help them figure out what to do next.
Data analysts know how to analyze the data available in the company. They discover insights in the data and then explain their findings to others.
So, analysts need to know:
What kind of data the company has;
How to get the data;
How to interpret the results;
How to explain their findings to colleagues and management.
Data analysts are also often responsible for defining key metrics and building different dashboards. This includes things like showing the company’s profits, displaying the number of listings, or how many contacts buyers made with sellers. Thus, data analysts should know how to calculate all the important business metrics, and how to present them in a way that is understandable to others.
When it comes to skills, data analysts should know:
SQL — this is the main tool that they work with;
Programming languages such as Python or R;
Tableau or similar tools for building dashboards;
Basics of statistics;
How to run experiments;
A bit of machine learning, such as regression analysis, and time series modeling.
For our example, product managers turn to data analysts to help them quantify the extent of the problem. Together with the PM, the data analyst tries to answer questions like:
“How many users are affected by this problem?”
“How many users don’t finish creating their listing because of this problem?”
“How many listings are there on the platform that don’t have the right category selected?”
After the analyst gets the data, analyzes it and answers these questions, they may conclude: “Yes, this is actually a problem”. Then the PM and the team discuss the repost and agree: “Indeed, this problem is actually worth solving”.
Now the data team will go ahead and start solving this problem.
After the model for the service is created, it’s necessary to understand if the service is effective: whether this model helps people and solves the problem. For that, data analysts usually run experiments — usually, A/B tests.
When running an experiment, we can see if more users successfully finish posting an item for sale or if there are fewer ads that end up in the wrong category.
The roles of a data scientist and data analyst are pretty similar. In some companies, it’s the same person who does both jobs. However, data scientists typically focus more on predicting rather than explaining.
A data analyst fetches the data, looks at it, explains what’s going on to the team, and gives some recommendations on what to do about it. A data scientist, on the other hand, focuses more on creating machine learning services. For example, one of the questions that a data scientist would want to answer is “How can we use this data to build a machine learning model for predicting something?”
In other words, data scientists incorporate the data into the product. Their focus is more on engineering than analysis. Data scientists work more closely with engineers on integrating data solutions into the product.
The skills of data scientists include:
Machine learning — the main tool for building predictive services;
Python — the primary programming language;
SQL — necessary to fetch the data for training their models;
Flask, Docker, and similar — to create simple web services for serving the models.
For our example, the data scientists are the people who develop the model used for predicting the category. Once they have a model, they can develop a simple web service for hosting this model.
Data engineers do all the heavy lifting when it comes to data. A lot of work needs to happen before data analysts can go to a database, fetch the data, perform their analysis, and come up with a report. This is precisely the focus of data engineers — they make sure this is possible. Their responsibility is to prepare all the necessary data in a form that is consumable for their colleagues.
To accomplish this, data engineers create “a data lake”. All the data that users generate needs to be captured properly and saved in a separate database. This way, analysts can run their analysis, and data scientists can use this data for training models.
Another thing data engineers often need to do, especially at larger companies, is to ensure that the people who look at the data have the necessary clearance to do so. Some user data is sensitive and people can’t just go looking around at personal information (such as emails or phone numbers) unless they have a really good reason to do so. Therefore, data engineers need to set up a system that doesn’t let people just access all the data at once.
The skills needed for data engineers usually include:
AWS or Google Cloud — popular cloud providers;
Kubernetes and Terraform — infrastructure tools;
Kafka or RabbitMQ — tools for capturing and processing the data;
Databases — to save the data in such a way that it’s accessible for data analysts;
Airflow or Luigi — data orchestration tools for building complex data pipelines.
In our example, a data engineer prepares all the required data. First, they make sure the analyst has the data to perform the analysis. Then they also work with the data scientist to prepare the information that we’ll need for training the model. That includes the title of the listing, its description, the category, and so on.
A data engineer isn’t the only type of engineer that a data team has. There are also machine learning engineers.
Machine Learning Engineer
Machine learning engineers take whatever data scientists build and help them scale it up. They also ensure that the service is maintainable and that the team follows the best engineering practices. Their focus is more on engineering than on modeling.
The skills ML engineers have are similar to that of data engineers:
AWS or Google Cloud;
Infrastructure tools like Kubernetes and Terraform;
Python and other programming languages;
Flask, Docker, and other tools for creating web services.
Additionally, ML engineers work closely with more “traditional” engineers, like backend engineers, frontend engineers, or mobile engineers, to ensure that the services from the data team are included in the final product.
For our example, ML engineers work together with data scientists on productionizing the category suggestion services. They make sure it’s stable once it’s rolled out to all the users. They must also ensure that it’s maintainable and it’s possible to make changes to the service in the future.
There’s another kind of engineer that can be pretty important in a data team — site reliability engineers.
DevOps / Site Reliability Engineer
The role of SREs is similar to the ML engineer, but the focus is more on the availability and reliability of the services.
SREs aren’t strictly limited to working with data. Their role is more general: they tend to focus less on business logic and more on infrastructure, which includes things like networking and provisioning infrastructure.
Therefore, SREs look after the servers where the services are running and take care of collecting all the operational metrics like CPU usage, how many requests per second there are, the services’ processes, and so on.
As the name suggests, site reliability engineers have to make sure that everything runs reliably. They set up alerts and are constantly on call to make sure that the services are up and running without any interruptions. If something breaks, SREs quickly diagnose the problem and fix it, or involve an engineer to help find the solution.
The skills needed for site reliability engineers:
Cloud infrastructure tools;
Programming languages like Python,
Best DevOps practices like automation, CI/CD, and the like.
Of course, ML engineers and data engineers should also know these best practices, but the focus of DevOps engineers/SREs is to establish them and make sure that they are followed.
There is a special type of DevOps engineer, called “MLOps engineer”.
An MLOps engineer is a DevOps engineer who also knows the basics of machine learning. Similar to an SRE, the responsibility of an MLOps Engineer is to make sure that the services, developed by data scientists, ML engineers, and data engineers, are up and running all the time.
MLOps engineers know the lifecycle of a machine learning model: the training phase, serving phase, and so on.
Despite having this knowledge, MLOps Engineers are still focused more on operational support than on anything else. This means that they need to know and follow all the DevOps practices and make sure that the rest of the team is following them as well. They accomplish this by setting up things like continuous retraining, and CI/CD pipelines.
Even though everyone in the team has a different focus, they all work together on achieving the same goal: solve the problems of the users.
To summarize, the roles in the data team and their responsibilities are:
Product managers — make sure that the team is building the right thing, act as a gateway to all the requests and speak on behalf of the users.
Data analysts — analyze data, define key metrics, and create dashboards.
Data scientists — build models and incorporate them into the product.
Data engineers — prepare the data for analysts and data scientists.
ML engineers — productionize machine learning services and establish the best engineering practices.
Site reliability engineers — focus on availability, reliability, enforce the best DevOps practices.
This list is not comprehensive, but it should be a good starting point if you are just getting into the industry, or if you just want to know how the lines between different roles are defined in the industry.
Article | October 27, 2020
All business functions whether it is finance, marketing, procurement, or others find using data and analytics to drive success an imperative for today. They want to make informed decisions and be able to predict trends that are based on trusted data and insights from the business, operations, and customers. The criticality of delivering these capabilities was emphasised in a recent report, “The Importance of Unified Data and Analytics, Why and How Preintegrated Data and Analytics Solutions Drive Busines Success,” from Forrester Consulting. For approximately two-thirds of the global data warehouse and analytics strategy decision-makers surveyed in the research, their key data and analytics priorities are:
Article | October 27, 2020
When it comes to marketing today, big data analytics has become a powerful being. The raw material marketers need to make sense of the information they are presented with so they can do their jobs with accuracy and excellence. Big data is what empowers marketers to understand their customers based on any online action they take.
Thanks to the boom of big data, marketers have learned more about new marketing trends and preferences, and behaviors of the consumer. For example, marketers know what their customers are streaming to what groceries they are ordering, thanks to big data.
Data is readily available in abundance due to digital technology. Data is created through mobile phones, social media, digital ads, weblogs, electronic devices, and sensors attached through the internet of things (IoT).
Data analytics helps organizations discover newer markets, learn how new customers interact with online ads, and draw conclusions and effects of new strategies. Newer sophisticated marketing analytics software and analytics tools are now being used to determine consumers’ buying patterns and key influencers in decision-making and validate data marketing approaches that yield the best results.
With the integration of product management with data science, real-time data capture, and analytics, big data analytics is helping companies increase sales and improve the customer experience.
In this article, we will examine how big data analytics are transforming the marketing industry.
Personalized Marketing has taken an essential place in direct marketing to the consumers. Greeting consumers with their first name whenever they visit the website, sending them promotional emails of their favorite products, or notifying them with personalized recipes based on their grocery shopping are some of the examples of data-driven marketing.
When marketers collect critical data marketing pieces about customers at different marketing touchpoints such as their interests, their name, what they like to listen to, what they order most, what they’d like to hear about, and who they want to hear from, this enables marketers to plan their campaigns strategically.
Marketers aim for churn prevention and onboarding new customers. With customer’s marketing touchpoints, these insights can be used to improve acquisition rates, drive brand loyalty, increase revenue per customer, and improve the effectiveness of products and services.
With these data marketing touchpoints, marketers can build an ideal customer profile. Furthermore, these customer profiles can help them strategize and execute personalized campaigns accordingly.
Customer behavior can be traced by historical data, which is the best way to predict how customers would behave in the future. It allows companies to correctly predict which customers are interested in their products at the right time and place. Predictive analytics applies data mining, statistical techniques, machine learning, and artificial intelligence for data analysis and predict the customer’s future behavior and activities.
Take an example of an online grocery store. If a customer tends to buy healthy and sugar-free snacks from the store now, they will keep buying it in the future too.
This predictable behavior from the customer makes it easy for brands to capitalize on that and has been made easy by analytics tools. They can automate their sales and target the said customer. What they would be doing gives the customer chances to make “repeat purchases” based on their predictive behavior. Marketers can also suggest customers purchase products related to those repeat purchases to get them on board with new products.
Customer segmentation means dividing your customers into strata to identify a specific pattern. For example, customers from a particular city may buy your products more than others, or customers from a certain age demographic prefer some products more than other age demographics.
Specific marketing analytics software can help you segment your audience. For example, you can gather data like specific interests, how many times they have visited a place, unique preferences, and demographics such as age, gender, work, and home location.
These insights are a golden opportunity for marketers to create bold campaigns optimizing their return on investment. They can cluster customers into specific groups and target these segments with highly relevant data marketing campaigns.
The main goal of customer segmentation is to identify any interesting information that can help them increase revenue and meet their goals. Effective customer segmentation can help marketers with:
• Identifying most profitable and least profitable customers
• Building loyal relationships
• Predicting customer patterns
• Pricing products accordingly
• Developing products based on their interests
Businesses continue to invest in collecting high-quality data for perfect customer segmentation, which results in successful efforts.
Optimized Ad Campaigns
Customers’ social media data like Facebook, LinkedIn, and Twitter makes it easier for marketers to create customized ad campaigns on a larger scale. This means that they can create specific ad campaigns for particular groups and successfully execute an ad campaign.
Big data also makes it easier for marketers to run ‘remarketing’ campaigns. Remarketing campaigns ads follow your customers online, wherever they browse, once they have visited your website.
Execution of an online ad campaign makes all the difference in its success. Chasing customers with paid ads can work as an effective strategy if executed well. According to the rule 7, prospective customers need to be exposed to an ad minimum of seven times before they make any move on it.
When creating online ad campaigns, do keep one thing in mind. Your customers should not feel as if they are being stalked when you make any remarketing campaigns. Space out your ads and their exposure, so they appear naturally rather than coming on as pushy.
Search engines and social media data enhance this. This data can be used to analyze their behavior patterns and market to them accordingly.
The information gained from search engines and social media can be used to influence consumers into staying loyal and help their businesses benefit from the same.
These implications can be frightening, like seeing personalized ads crop up on their Facebook page or search engine. However, when consumer data is so openly available to marketers, they need to use it wisely and safeguard it from falling into the wrong hands.
Fortunately, businesses are taking note and making sure that this information remains secure.
The future of marketing because of big data and analytics seems bright and optimistic. Businesses are collecting high-quality data in real-time and analyzing it with the help of machine learning and AI; the marketing world seems to be up for massive changes. Analytics are transforming marketing industry to a different level. And with sophisticated marketers behind the wheel, the sky is the only limit.
Frequently Asked Questions
Why is marketing analytics so important these days?
Marketing analytics helps us see how everything plays off each other, and decide how we might want to invest moving forward. Re-prioritizing how you spend your time, how you build out your team, and the resources you invest in channels and efforts are critical steps to achieving marketing team success.
What is the use of marketing analytics?
Marketing analytics is used to measure how well your marketing efforts are performing and to determine what can be done differently to get better results across marketing channels.
Which companies use marketing analytics?
Marketing analytics enables you to improve your overall marketing program performance by identifying channel deficiencies, adjusting strategies and tactics as needed, optimizing processes, etc. Companies like Netflix, Sephora, EasyJet, and Spotify use marketing analytics to improve their markeitng performance as well.
"name": "Why is marketing analytics so important these days?",
"text": "Marketing analytics helps us see how everything plays off each other, and decide how we might want to invest moving forward. Re-prioritizing how you spend your time, how you build out your team and the resources you invest in channels and efforts are critical steps to achieving marketing team success"
"name": "What is the use of marketing analytics?",
"text": "Marketing analytics is used to measure how well your marketing efforts are performing and to determine what can be done differently to get better results across marketing channels."
"name": "Which companies use marketing analytics?",
"text": "Marketing analytics enables you to improve your overall marketing program performance by identifying channel deficiencies, adjusting strategies and tactics as needed, optimizing processes, etc. Companies like Netflix, Sephora, EasyJet, and Spotify use marketing analytics to improve their markeitng performance as well."