DataChats | Episode 1 | An Interview With Max Kuhn, Creator of Caret

| November 22, 2016

article image
In this episode of DataChats Nick talks with Max Kuhn, the creator of the caret package for R. Interested in learning more? Start DataCamp's Machine Learning course for free at https://www.datacamp.com/courses/mach.... Max is a frequent speaker at many of the main data science conferences and is well known as the creator of the caret package for R, an essential tool in every R user’s machine learning toolbox.

Spotlight

SAP

SAP is a global software application vendor. SAP is the market leader in enterprise application software, helping companies of all sizes and in all industries run at their best: 77% of the world’s transaction revenue touches an SAP system. Our machine learning, Internet of Things (IoT), and advanced analytics technologies help turn customers’ businesses into intelligent enterprises. Our end-to-end suite of applications and services enables our customers to operate profitably, adapt continuously, and make a difference. With a global network of customers, partners, employees, and thought leaders, SAP helps the world run better and improves people’s lives.

OTHER ARTICLES

Data Analytics the Force Behind the IoT Evolution

Article | April 3, 2020

Primarily,the IoT stack is going beyond merely ingesting data to data analytics and management, with a focus on real-time analysis and autonomous AI capacities. Enterprises are finding more advanced ways to apply IoT for better and more profitable outcomes. IoT platforms have evolved to use standard open-source protocols and components. Now enterprises are primarily focusing on resolving business problems such as predictive maintenance or usage of smart devices to streamline business operations.Platforms focus on similar things, but early attempts at the creation of highly discrete solutions around specific use cases in place of broad platforms, have been successful. That means more vendors offer more choices for customers, to broaden the chances for success. Clearly, IoT platforms actually sit at the heart of value creation in the IoT.

Read More

Top 6 Marketing Analytics Trends in 2021

Article | June 21, 2021

The marketing industry keeps changing every year. Businesses and enterprises have the task of keeping up with the changes in marketing trends as they evolve. As consumer demands and behavior changed, brands had to move from traditional marketing channels like print and electronic to digital channels like social media, Google Ads, YouTube, and more. Businesses have begun to consider marketing analytics a crucial component of marketing as they are the primary reason for success. In uncertain times, marketing analytics tools calculate and evaluate the market status and enhances better planning for enterprises. As Covid-19 hit the world, organizations that used traditional marketing analytics tools and relied on historical data realized that many of these models became irrelevant. The pandemic rendered a lot of data useless. With machine learning (ML) and artificial intelligence (AI) in marketers’ arsenal, marketing analytics is turning virtual with a shift in the marketing landscape in 2021. They are also pivoting from relying on just AI technologies but rather combining big data with it. AI and machine learning help advertisers and marketers to improve their target audience and re-strategize their campaigns through advanced marketing attributes, which in turn increases customer retention and customer loyalty. While technology is making targeting and measuring possible, marketers have had to reassure their commitment to consumer privacy and data regulations and governance in their initiatives. They are also relying on third-party data. These data and analytics trends will help organizations deal with radical changes and uncertainties, with opportunities they bring with them over the next few years. To know why businesses are gravitating towards these trends in marketing analytics, let us look at why it is so important. Importance of Marketing Analytics As businesses extended into new marketing categories, new technologies were implemented to support them. This new technology was usually deployed in isolation, which resulted in assorted and disconnected data sets. Usually, marketers based their decisions on data from individual channels like website metrics, not considering other marketers channels. Website and social media metrics alone are not enough. In contrast, marketing analytics tools look at all marketing done across channels over a period of time that is vital for sound decision-making and effective program execution. Marketing analytics helps understand how well a campaign is working to achieve business goals or key performance indicators. Marketing analytics allows you to answer questions like: • How are your marketing initiatives/ campaigns working? What can be done to improve them? • How do your marketing campaigns compare with others? What are they spending their time and money on? What marketing analytics software are they using that helps them? • What should be your next step? How should you allocate the marketing budget according to your current spending? Now that the advantages of marketing analytics are clear, let us get into the details of the trends in marketing analytics of 2021: Rise of real-time marketing data analytics Reciprocation to any action is the biggest trend right now in digital marketing, especially post Covid. Brands and businesses strive to respond to customer queries and provide them with solutions. Running queries in a low-latency customer data platform have allowed marketers to filter the view by the audience and identify underachieving sectors. Once this data is collected, businesses and brands can then readjust their customer targeting and messaging to optimize their performance. To achieve this on a larger scale, organizations need to invest in marketing analytics software and platforms to balance data loads with processing for business intelligence and analytics. The platform needs to allow different types of jobs to run parallel by adding resources to groups as required. This gives data scientists more flexibility and access to response data at any given time. Real-time analytics will also aid marketers in identifying underlying threats and problems in their strategies. Marketers will have to conduct a SWOT analysis and continuously optimize their campaigns to suit them better. . Data security, regulatory compliance, and protecting consumer privacy Protecting market data from a rise in cybercrimes and breaches are crucial problems to be addressed in 2021. This year has seen a surge in data breaches that have damaged businesses and their infrastructures to different levels. As a result, marketers have increased their investments in encryption, access control, network monitoring, and other security measures. To help comply with the General Data Protection Regulation (GDPR) of the European Union, the California Consumer Privacy Act (CCPA), and other regulatory bodies, organizations have made the shift to platforms where all consumer data is in one place. Advanced encryptions and stateless computing have made it possible to securely store and share governed data that can be kept in a single location. Interacting with a single copy of the same data will help compliance officers tasked with identifying and deleting every piece of information related to a particular customer much easier and the possibility of overseeing something gets canceled. Protecting consumer privacy is imperative for marketers. They offer consumers the control to opt out, eradicate their data once they have left the platform, and remove information like location, access control to personally identifiable information like email addresses and billing details separated from other marketing data. Predictive analytics Predictive analytics’ analyzes collected data and predicts future outcomes through ML and AI. It maps out a lookalike audience and identifies which strata are most likely to become a high-value customer and which customer strata has the highest likelihood of churn. It also gauges people’s interests based on their browsing history. With better ML models, predictions have become better overtime, leading to increased customer retention and a drop in churn. According to the research by Zion Market Research, by 2022, the global market for predictive analytics is set to hit $11 billion. Investment in first-party data Cookies-enabled website tracking led marketers to know who was visiting their website and re-calibrate their ads to these people throughout the web. However, in 2020, Google announced cookies would be phased out of Chrome within two years while they had already removed them from Safari and Firefox. Now that adding low-friction tracking to web pages will be tough, marketers will have to gather more limited data. This will then be then integrated with first-party data sets to get a rounded view of the customer. Although a big win for consumer privacy activists, it is difficult for advertisers and agencies to find it more difficult to retarget ads and build audiences in their data management platforms. In a digital world without cookies, marketers now understand how customer data is collected, introspect on their marketing models, and evaluate their marketing strategy. Emergence of contextual customer experience These trends in marketing analytics have become more contextually conscious since the denunciation of cookies. Since marketers are losing their data sets and behavioral data, they have an added motivation to invest in insights. This means that marketers have to target messaging based on known and inferred customer characteristics like their age, location, income, brand affinity, and where these customers are in their buying journey. For example, marketers should tailor messaging in ads to make up consumers based on the frequency of their visits to the store. Effective contextual targeting hinges upon marketers using a single platform for their data and creates a holistic customer profile. Reliance on third-party data Even though there has been a drop in third-party data collection, marketers will continue to invest in third-party data which have a complete understanding of their customers that augments the first-party data they have. Historically, third-party data has been difficult to source and maintain for marketers. There are new platforms that counter improvement of data like long time to value, cost of maintaining third-party data pipelines, and data governance problems. U.S. marketers have spent upwards of $11.9 billion on third-party audience data in 2019, up 6.1% from 2018, and this reported growth curve is going to be even steeper in 2021, according to a study by Interactive Advertising Bureau and Winterberry Group. Conclusion Marketing analytics enables more successful marketing as it shows off direct results of the marketing efforts and investments. These new marketing data analytics trends have made their definite mark and are set to make this year interesting with data and AI-based applications mixed with the changing landscape of marketing channels. Digital marketing will be in demand more than ever as people are purchasing more online. Frequently Asked Questions Why is marketing analytics so important? Marketing analytics has two main purposes; to gauge how well your marketing efforts perform and measure the effectiveness of marketing activity. What is the use of marketing analytics? Marketing analytics help us understand how everything plays off of each other and decide how to invest, whether to re-prioritize or keep going with the current methods. Which industries use marketing analytics? Commercial organizations use it to analyze data from different sources, use analytics to determine the success of a marketing campaign, and target customers specifically. What are the types of marketing analytics tools? Some marketing analytics’ tools are Google Analytics, HubSpot Marketing Hub, Semrush, Looker, Optimizely, etc. { "@context": "https://schema.org", "@type": "FAQPage", "mainEntity": [{ "@type": "Question", "name": "Why is marketing analytics so important?", "acceptedAnswer": { "@type": "Answer", "text": "Marketing analytics has two main purposes; to gauge how well your marketing efforts perform and measure the effectiveness of marketing activity." } },{ "@type": "Question", "name": "What is the use of marketing analytics?", "acceptedAnswer": { "@type": "Answer", "text": "Marketing analytics help us understand how everything plays off of each other and decide how to invest, whether to re-prioritize or keep going with the current methods." } },{ "@type": "Question", "name": "Which industries use marketing analytics?", "acceptedAnswer": { "@type": "Answer", "text": "Commercial organizations use it to analyze data from different sources, use analytics to determine the success of a marketing campaign, and target customers specifically." } },{ "@type": "Question", "name": "What are the types of marketing analytics tools?", "acceptedAnswer": { "@type": "Answer", "text": "Some marketing analytics’ tools are Google Analytics, HubSpot Marketing Hub, Semrush, Looker, Optimizely, etc." } }] }

Read More

Roles in a Data Team

Article | December 17, 2020

In this article, we’ll talk about different roles in a data team and discuss their responsibilities. In particular, we will cover: The types of roles in a data team; The responsibilities of each role; The skills and knowledge each role needs to have. This is not a comprehensive list and the majority of what you will read in this article is my opinion, which comes out of my experience from working as a data scientist. You can interpret the following information as “the description of data roles from the perspective of a data scientist”. For example, my views on the role of a data engineer may be a bit simplified because I don’t see all the complexities of their work firsthand. I do hope you will find this information useful nonetheless. Roles in a Team A typical data team consists of the following roles: Product managers, Data analysts, Data scientists, Data engineers, Machine learning engineers, and Site reliability engineers / MLOps engineers. All these people work to create a data product. To explain the core responsibilities of each role, we will use a case scenario: Suppose we work at an online classifieds company. It’s a platform where users can go to sell things they don’t need (like OLX, where I work). If a user has an iPhone they want to sell — they go to this website, create a listing and sell their phone. On this platform, sellers sometimes have problems with identifying the correct category for the items they are selling. To help them, we want to build a service that suggests the best category. To sell their iPhone, the user creates a listing and the site needs to automatically understand that this iPhone has to go in the “mobile phones” category. Let’s start with the first role: product manager. Product Manager A product manager is someone responsible for developing products. Their goal is to make sure that the team is building the right thing. They are typically less technical than the rest of the team: they don’t focus on the implementation aspects of a problem, but rather the problem itself. Product managers need to ensure that the product is actually used by the end-users. This is a common problem: in many companies, engineers create something that doesn’t solve real problems. Therefore, the product manager is somebody who speaks to the team on behalf of the users. The primary skills a PM needs to have are communication skills. For data scientists, communication is a soft skill, but for a product manager — it’s a hard skill. They have to have it to perform their work. Product managers also do a lot of planning: they need to understand the problem, come up with a solution, and make sure the solution is implemented in a timely manner. To accomplish this, PMs need to know what’s important and plan the work accordingly. When somebody has a problem, they approach the PM with it. Then the task of the PM is to figure out if users actually need this feature, how important this feature is, and if the team has the capacity to implement it. Let’s come back to our example. Suppose somebody comes to the PM and says: “We want to build a feature to automatically suggest the category for a listing. Somebody’s selling an iPhone, and we want to create a service that predicts that the item goes in the mobile phones category.” Product managers need to answer these questions: “Is this feature that important to the user?” “Is it an important problem to solve in the product at all?” To answer these questions, PMs ask data analysts to help them figure out what to do next. Data Analyst Data analysts know how to analyze the data available in the company. They discover insights in the data and then explain their findings to others. So, analysts need to know: What kind of data the company has; How to get the data; How to interpret the results; How to explain their findings to colleagues and management. Data analysts are also often responsible for defining key metrics and building different dashboards. This includes things like showing the company’s profits, displaying the number of listings, or how many contacts buyers made with sellers. Thus, data analysts should know how to calculate all the important business metrics, and how to present them in a way that is understandable to others. When it comes to skills, data analysts should know: SQL — this is the main tool that they work with; Programming languages such as Python or R; Tableau or similar tools for building dashboards; Basics of statistics; How to run experiments; A bit of machine learning, such as regression analysis, and time series modeling. For our example, product managers turn to data analysts to help them quantify the extent of the problem. Together with the PM, the data analyst tries to answer questions like: “How many users are affected by this problem?” “How many users don’t finish creating their listing because of this problem?” “How many listings are there on the platform that don’t have the right category selected?” After the analyst gets the data, analyzes it and answers these questions, they may conclude: “Yes, this is actually a problem”. Then the PM and the team discuss the repost and agree: “Indeed, this problem is actually worth solving”. Now the data team will go ahead and start solving this problem. After the model for the service is created, it’s necessary to understand if the service is effective: whether this model helps people and solves the problem. For that, data analysts usually run experiments — usually, A/B tests. When running an experiment, we can see if more users successfully finish posting an item for sale or if there are fewer ads that end up in the wrong category. Data Scientist The roles of a data scientist and data analyst are pretty similar. In some companies, it’s the same person who does both jobs. However, data scientists typically focus more on predicting rather than explaining. A data analyst fetches the data, looks at it, explains what’s going on to the team, and gives some recommendations on what to do about it. A data scientist, on the other hand, focuses more on creating machine learning services. For example, one of the questions that a data scientist would want to answer is “How can we use this data to build a machine learning model for predicting something?” In other words, data scientists incorporate the data into the product. Their focus is more on engineering than analysis. Data scientists work more closely with engineers on integrating data solutions into the product. The skills of data scientists include: Machine learning — the main tool for building predictive services; Python — the primary programming language; SQL — necessary to fetch the data for training their models; Flask, Docker, and similar — to create simple web services for serving the models. For our example, the data scientists are the people who develop the model used for predicting the category. Once they have a model, they can develop a simple web service for hosting this model. Data Engineers Data engineers do all the heavy lifting when it comes to data. A lot of work needs to happen before data analysts can go to a database, fetch the data, perform their analysis, and come up with a report. This is precisely the focus of data engineers — they make sure this is possible. Their responsibility is to prepare all the necessary data in a form that is consumable for their colleagues. To accomplish this, data engineers create “a data lake”. All the data that users generate needs to be captured properly and saved in a separate database. This way, analysts can run their analysis, and data scientists can use this data for training models. Another thing data engineers often need to do, especially at larger companies, is to ensure that the people who look at the data have the necessary clearance to do so. Some user data is sensitive and people can’t just go looking around at personal information (such as emails or phone numbers) unless they have a really good reason to do so. Therefore, data engineers need to set up a system that doesn’t let people just access all the data at once. The skills needed for data engineers usually include: AWS or Google Cloud — popular cloud providers; Kubernetes and Terraform — infrastructure tools; Kafka or RabbitMQ — tools for capturing and processing the data; Databases — to save the data in such a way that it’s accessible for data analysts; Airflow or Luigi — data orchestration tools for building complex data pipelines. In our example, a data engineer prepares all the required data. First, they make sure the analyst has the data to perform the analysis. Then they also work with the data scientist to prepare the information that we’ll need for training the model. That includes the title of the listing, its description, the category, and so on. A data engineer isn’t the only type of engineer that a data team has. There are also machine learning engineers. Machine Learning Engineer Machine learning engineers take whatever data scientists build and help them scale it up. They also ensure that the service is maintainable and that the team follows the best engineering practices. Their focus is more on engineering than on modeling. The skills ML engineers have are similar to that of data engineers: AWS or Google Cloud; Infrastructure tools like Kubernetes and Terraform; Python and other programming languages; Flask, Docker, and other tools for creating web services. Additionally, ML engineers work closely with more “traditional” engineers, like backend engineers, frontend engineers, or mobile engineers, to ensure that the services from the data team are included in the final product. For our example, ML engineers work together with data scientists on productionizing the category suggestion services. They make sure it’s stable once it’s rolled out to all the users. They must also ensure that it’s maintainable and it’s possible to make changes to the service in the future. There’s another kind of engineer that can be pretty important in a data team — site reliability engineers. DevOps / Site Reliability Engineer The role of SREs is similar to the ML engineer, but the focus is more on the availability and reliability of the services. SREs aren’t strictly limited to working with data. Their role is more general: they tend to focus less on business logic and more on infrastructure, which includes things like networking and provisioning infrastructure. Therefore, SREs look after the servers where the services are running and take care of collecting all the operational metrics like CPU usage, how many requests per second there are, the services’ processes, and so on. As the name suggests, site reliability engineers have to make sure that everything runs reliably. They set up alerts and are constantly on call to make sure that the services are up and running without any interruptions. If something breaks, SREs quickly diagnose the problem and fix it, or involve an engineer to help find the solution. The skills needed for site reliability engineers: Cloud infrastructure tools; Programming languages like Python, Unix/Linux; Networking; Best DevOps practices like automation, CI/CD, and the like. Of course, ML engineers and data engineers should also know these best practices, but the focus of DevOps engineers/SREs is to establish them and make sure that they are followed. There is a special type of DevOps engineer, called “MLOps engineer”. MLOps Engineer An MLOps engineer is a DevOps engineer who also knows the basics of machine learning. Similar to an SRE, the responsibility of an MLOps Engineer is to make sure that the services, developed by data scientists, ML engineers, and data engineers, are up and running all the time. MLOps engineers know the lifecycle of a machine learning model: the training phase, serving phase, and so on. Despite having this knowledge, MLOps Engineers are still focused more on operational support than on anything else. This means that they need to know and follow all the DevOps practices and make sure that the rest of the team is following them as well. They accomplish this by setting up things like continuous retraining, and CI/CD pipelines. Even though everyone in the team has a different focus, they all work together on achieving the same goal: solve the problems of the users. Summary To summarize, the roles in the data team and their responsibilities are: Product managers — make sure that the team is building the right thing, act as a gateway to all the requests and speak on behalf of the users. Data analysts — analyze data, define key metrics, and create dashboards. Data scientists — build models and incorporate them into the product. Data engineers — prepare the data for analysts and data scientists. ML engineers — productionize machine learning services and establish the best engineering practices. Site reliability engineers — focus on availability, reliability, enforce the best DevOps practices. This list is not comprehensive, but it should be a good starting point if you are just getting into the industry, or if you just want to know how the lines between different roles are defined in the industry.

Read More

Data Analytics Convergence: Business Intelligence(BI) Meets Machine Learning (ML)

Article | July 29, 2020

Headquartered in London, England, BP (NYSE: BP) is a multinational oil and gas company. Operating since 1909, the organization offers its customers with fuel for transportation, energy for heat and light, lubricants to keep engines moving, and the petrochemicals products. Business intelligence has always been a key enabler for improving decision making processes in large enterprises from early days of spreadsheet software to building enterprise data warehouses for housing large sets of enterprise data and to more recent developments of mining those datasets to unearth hidden relationships. One underlying theme throughout this evolution has been the delegation of crucial task of finding out the remarkable relationships between various objects of interest to human beings. What BI technology has been doing, in other words, is to make it possible (and often easy too) to find the needle in the proverbial haystack if you somehow know in which sectors of the barn it is likely to be. It is a validatory as opposed to a predictory technology. When the amount of data is huge in terms of variety, amount, and dimensionality (a.k.a. Big Data) and/or the relationship between datasets are beyond first-order linear relationships amicable to human intuition, the above strategy of relying solely on humans to make essential thinking about the datasets and utilizing machines only for crucial but dumb data infrastructure tasks becomes totally inadequate. The remedy to the problem follows directly from our characterization of it: finding ways to utilize the machines beyond menial tasks and offloading some or most of cognitive work from humans to the machines. Does this mean all the technology and associated practices developed over the decades in BI space are not useful anymore in Big Data age? Not at all. On the contrary, they are more useful than ever: whereas in the past humans were in the driving seat and controlling the demand for the use of the datasets acquired and curated diligently, we have now machines taking up that important role and hence unleashing manifold different ways of using the data and finding out obscure, non-intuitive relationships that allude humans. Moreover, machines can bring unprecedented speed and processing scalability to the game that would be either prohibitively expensive or outright impossible to do with human workforce. Companies have to realize both the enormous potential of using new automated, predictive analytics technologies such as machine learning and how to successfully incorporate and utilize those advanced technologies into the data analysis and processing fabric of their existing infrastructure. It is this marrying of relatively old, stable technologies of data mining, data warehousing, enterprise data models, etc. with the new automated predictive technologies that has the huge potential to unleash the benefits so often being hyped by the vested interests of new tools and applications as the answer to all data analytical problems. To see this in the context of predictive analytics, let's consider the machine learning(ML) technology. The easiest way to understand machine learning would be to look at the simplest ML algorithm: linear regression. ML technology will build on basic interpolation idea of the regression and extend it using sophisticated mathematical techniques that would not necessarily be obvious to the causal users. For example, some ML algorithms would extend linear regression approach to model non-linear (i.e. higher order) relationships between dependent and independent variables in the dataset via clever mathematical transformations (a.k.a kernel methods) that will express those non-linear relationship in a linear form and hence suitable to be run through a linear algorithm. Be it a simple linear algorithm or its more sophisticated kernel methods variation, ML algorithms will not have any context on the data they process. This is both a strength and weakness at the same time. Strength because the same algorithms could process a variety of different kinds of data, allowing us to leverage all the work gone through the development of those algorithms in different business contexts, weakness because since the algorithms lack any contextual understanding of the data, perennial computer science truth of garbage in, garbage out manifests itself unceremoniously here : ML models have to be fed "right" kind of data to draw out correct insights that explain the inner relationships in the data being processed. ML technology provides an impressive set of sophisticated data analysis and modelling algorithms that could find out very intricate relationships among the datasets they process. It provides not only very sophisticated, advanced data analysis and modeling methods but also the ability to use these methods in an automated, hence massively distributed and scalable ways. Its Achilles' heel however is its heavy dependence on the data it is being fed with. Best analytic methods would be useless, as far as drawing out useful insights from them are concerned, if they are applied on the wrong kind of data. More seriously, the use of advanced analytical technology could give a false sense of confidence to their users over the analysis results those methods produce, making the whole undertaking not just useless but actually dangerous. We can address the fundamental weakness of ML technology by deploying its advanced, raw algorithmic processing capabilities in conjunction with the existing data analytics technology whereby contextual data relationships and key domain knowledge coming from existing BI estate (data mining efforts, data warehouses, enterprise data models, business rules, etc.) are used to feed ML analytics pipeline. This approach will combine superior algorithmic processing capabilities of the new ML technology with the enterprise knowledge accumulated through BI efforts and will allow companies build on their existing data analytics investments while transitioning to use incoming advanced technologies. This, I believe, is effectively a win-win situation and will be key to the success of any company involved in data analytics efforts.

Read More

Spotlight

SAP

SAP is a global software application vendor. SAP is the market leader in enterprise application software, helping companies of all sizes and in all industries run at their best: 77% of the world’s transaction revenue touches an SAP system. Our machine learning, Internet of Things (IoT), and advanced analytics technologies help turn customers’ businesses into intelligent enterprises. Our end-to-end suite of applications and services enables our customers to operate profitably, adapt continuously, and make a difference. With a global network of customers, partners, employees, and thought leaders, SAP helps the world run better and improves people’s lives.

Events