Logical Architectures for Big Data Analytics

|

article image
If you check the reference architectures for big data analytics proposed by Forrester and Gartner, or ask your colleagues building big data analytics platforms for their companies (typically under the ‘enterprise data lake’ tag), they will all tell you that modern analytics need a plurality of systems: one or several Hadoop clusters, in-memory processing systems, streaming tools, NoSQL databases, analytical appliances and operational data stores, among others (see Figure 1 for an example architecture).

Spotlight

FirstWatch

Real-time & Mobile, FirstWatch helps manage Critical Data, anytime, anywhere! FirstWatch is a web based, real-time data analysis and Dashboard software system which allows authorized users to drill-down into Charts, Graphs and Maps featuring detailed statistical trends, patterns and geographic clusters of incident information, all based on user-defined criteria. FirstWatch is used everyday for Situational Awareness, Homeland Security, Public Health Surveillance, as well as for Operational and Performance monitoring / quality improvement for Public Safety (911, EMS, Fire & Law Enforcement) teams.

OTHER ARTICLES

New Spain data center becomes test bed for Microsoft and Telefonica’s expanded partnership

Article | February 27, 2020

Microsoft recently announced that it’s leveraging a new global strategic partnership with Telefonica to jointly develop “go-to-market plans for regions the company does business.Last year during Mobile World Congress 2019, Microsoft took the veil off its newfound relationship with the international telecommunications giant, Telefonica.Highlighted during this year’s announcement was Microsoft’s opening of a new datacenter region in Spain. Microsoft’s new data center comes at a time where the company looks to help expedite Spain’s digital transformation.

Read More

Data Analytics vs Data Science Comparison

Article | February 27, 2020

The terms data science and data analytics are not unfamiliar with individuals who function within the technology field. Indeed, these two terms seem the same and most people use them as synonyms for each other. However, a large proportion of individuals are not aware that there is actually a difference between data science and data analytics.It is pertinent that individuals whose work revolves around these terms or the information and technology industries, should know how to use these terms in the appropriate contexts. The reason for this is quite simple: the right usage of these terms has significant impacts on the management and productivity of a business, especially in today’s rapidly data-dependent world.

Read More

6 Best SaaS Marketing Metrics for Business Growth

Article | February 27, 2020

The software-as-a-service industry is rapidly growing with an estimate to reach $219.5 billion by 2027. SaaS marketing strategies is highly different from other industries; thus, tracking the right metrics for marketing is necessary. SaaS kpis or metrics measure an enterprise’s performance, growth, and momentum. These saas marketing metrics are have been designed to evaluate the health of a business by tracking sales, marketing, and customer success. Direct access to data will help you develop your business and show whether there is any room for development. SaaS KPIs: What Are They and Why Do They Matter? Marketing metrics for SaaS indicate growth in different ways. SaaS KPIs, just like regular KPIs, helps business to evaluate their business models and strategies. These key metrics for SaaS companies give a deep insight into which sectors perform well and require reassessment. To optimize any company’s exposure, SaaS metrics for marketing are highly essential. They measure the performance of sales, marketing, and customer retention. SaaS companies believe in the entire life cycle of the customer, while traditional web-based companies focus on immediate sales. The overall goal of SaaS companies is to build long-lasting customer relationships since most revenue is generated through their recurring payments. SaaS marketing technology are SaaS marketers’ greatest asset if they take the time and effort to understand and implement them. There are essential and unimportant metrics. Knowing which metrics to pay attention to is a challenge. Once you get these metrics right, they will help you to detect your company’s strengths and weaknesses and help you understand whether they are working or not. There are more than fifteen metrics one can track but make you lose sight of what matters. In this article, we have identified the critical metrics every SaaS should track: Unique Visitors This metric measures the number of visitors your website or page sees in a specific time period. If someone visits your website four to five times in that given time period, it will be counted as one unique visitor. Recording this metric is crucial as it shows you what type of visitors your site receives and from what channels they arrive. When the number of unique visitors is high, it indicates to the SaaS marketers that their content resonates with the target customers. It is vital to note, however, which channels these unique visitors reach your website. These channels can be: Organic traffic Social media Paid ads SaaS marketers should, at this point, identify which channels are working and double down on those. Once you know these channels, you can allocate budgets and optimize these channels for better performance. Google Analytics is the best free tool to track unique visitors. The tool enables you to refine by dates and compare time periods and generate a report. Leads Leads is a broad term that can be broken down into two sub-categories: Sales Qualified Leads (SQL) and Marketing Qualified Leads (MQL). Defining SQL and MQL is important as they can be different for every business. So, let us break down the definitions for the two: MQL MQLs are those leads that have moved past the visitor phase in the customer lifecycle. They have taken steps to move ahead and become qualified to become potential customers. They have engaged with your website multiple times. For example, they have visited your website to check out prices, case studies or have downloaded your whitepapers more than two times. SQL SQLs actively engage with your site and are more qualified than MQLs. This lead is what you have deemed as the ideal sales candidate. They are way past the initial search stage, evaluating vendors, and are ready for a direct sales pitch. The most crucial distinction between the two is that your sales team has deemed them sales-worthy. After distinguishing between the two leads, you need to take the next appropriate steps. The best way to measure these leads is through closed-loop automation tools like HubSpot, Marketo, or Pardot. These automation tools will help you set up the criteria that automatically set up an individual as lead based on your website's SQL and MQL actions. Next, track the website traffic to ensure these unique visitors turn into potential leads. Churn The churn rate, in short, refers to the number of customers lost in a given time frame. It is the number of revenue SaaS customers who cancel their recurring revenue services. Since SaaS is a subscription-based service, losing customers directly correlates to losing money. The churn rate also indicates that your customers aren’t getting what they want from your service. Like most of your saas KPIs, you will be reporting on the churn rate every month. To calculate the churn rate, take the total number of customers you lost in the month you’re reporting on. Next, divide that by the number of customers you had at the beginning of the reporting month. Then, multiply that number by 100 to get the percentage. A churn is natural for any business. However, a high churn rate is an indicator that your business is in trouble. Therefore, it is an essential metric to track for your SaaS company. Customer Lifetime Value Customer lifetime value (CLV) measures how valuable a customer is to your business. It is the average amount of money your customers pay during their involvement with your SaaS company. You measure not only their value based on purchases but also the overall relationship. Keeping an existing client is more important than acquiring a new one which makes this metric important. Measuring CLV is a bit complicated than measuring other metrics. First, calculate the average customer lifetime by taking the number one divided by the customer churn rate. As an example, let’s say your monthly churn rate is 1%. Your average customer lifetime would be 1/0.01 = 100 months. Then take the average customer lifetime and multiply it by the average revenue per account (ARPA) over a given time period. If your company, for example, brought in $100,000 in revenue last month off of 100 customers, that would be $1,000 in revenue per account. Finally, this brings us to CLV. You’ll now need to multiply customer lifetime (100 months) by your ARPA ($1,000). That brings us to 100 x $1,000, or $100,000 CLV. CLV is crucial as it indicates whether or not there is a proper strategy in place for business growth. It also shows investors the value of your company. Customer Acquisition Cost Customer acquisition cost (CAC) tells you how much you should spend on acquiring a new customer. The two main factors that determine the CAC are: Lead generation costs Cost of converting that lead into a client The CAC predicts the resources needed to acquire new customers. It is vital to understand this metric if you want to grow your customer base and make a profit. To calculate your CAC for any given period, divide your marketing and sales spend over that time period by the number of customers gained during the same time. It might cost more to acquire a new customer, but what if that customer ends up spending more than most? That’s where the CLV to CAC ratio comes into play. CLV: CAC Ratio CLV: CAC ratio go hand in hand. Comparing the two will help you understand the impact of your business. The CLV: CAC ratio shows the lifetime value of your customers and the amount you spend to gain new ones in a single metric. The ultimate goal of your company should be to have a high CLV: CAC ratio. According to SaaS analytics, a healthy business should have a CLV three times greater than its CAC. Just divide your calculated CLV by CAC to get the ratio. Some top-performing companies even have a ratio of 5:1. SaaS companies use this number to measure the health of marketing programs to invest in campaigns that work well or divert the resources to those campaigns that work well. Conclusion Always remember to set healthy marketing KPIs. Reporting on these numbers is never enough. Ensure that everything you do in marketing ties up to all the goals you have set for your company. Goal-driven SaaS marketing strategies always pay off and empower you and your company to be successful. Frequently Asked Questions What are the 5 most important metrics for SaaS companies? The five most important metrics for SaaS companies are Unique Visitors, Churn, Customer Lifetime Value, Customer Acquisition Cost, and Lead to Customer Conversion Rate. Why should we measure SaaS marketing metrics? Measuring marketing metrics are critically important because they help brands determine whether campaigns are successful, and provide insights to adjust future campaigns accordingly. They help marketers understand how their campaigns are driving towards their business goals, and inform decisions for optimizing their campaigns and marketing channels. How to measure the success of your SaaS marketing? The success of SaaS marketing can be measured by identifying the metrics that help them succeed. Some examples of those metrics are: Unique Visitors, Churn, Customer Lifetime Value, Customer Acquisition Cost, and Lead to Customer Conversion Rate. { "@context": "https://schema.org", "@type": "FAQPage", "mainEntity": [{ "@type": "Question", "name": "What are the 5 most important metrics for SaaS companies?", "acceptedAnswer": { "@type": "Answer", "text": "The five most important metrics for SaaS companies are Unique Visitors, Churn, Customer Lifetime Value, Customer Acquisition Cost, and Lead to Customer Conversion Rate." } },{ "@type": "Question", "name": "Why should we measure SaaS marketing metrics?", "acceptedAnswer": { "@type": "Answer", "text": "Measuring marketing metrics are critically important because they help brands determine whether campaigns are successful, and provide insights to adjust future campaigns accordingly. They help marketers understand how their campaigns are driving towards their business goals, and inform decisions for optimizing their campaigns and marketing channels." } },{ "@type": "Question", "name": "How to measure the success of your SaaS marketing?", "acceptedAnswer": { "@type": "Answer", "text": "The success of SaaS marketing can be measured by identifying the metrics that help them succeed. Some examples of those metrics are: Unique Visitors, Churn, Customer Lifetime Value, Customer Acquisition Cost, and Lead to Customer Conversion Rate." } }] }

Read More
BIG DATA MANAGEMENT

Roles in a Data Team

Article | February 27, 2020

In this article, we’ll talk about different roles in a data team and discuss their responsibilities. In particular, we will cover: The types of roles in a data team; The responsibilities of each role; The skills and knowledge each role needs to have. This is not a comprehensive list and the majority of what you will read in this article is my opinion, which comes out of my experience from working as a data scientist. You can interpret the following information as “the description of data roles from the perspective of a data scientist”. For example, my views on the role of a data engineer may be a bit simplified because I don’t see all the complexities of their work firsthand. I do hope you will find this information useful nonetheless. Roles in a Team A typical data team consists of the following roles: Product managers, Data analysts, Data scientists, Data engineers, Machine learning engineers, and Site reliability engineers / MLOps engineers. All these people work to create a data product. To explain the core responsibilities of each role, we will use a case scenario: Suppose we work at an online classifieds company. It’s a platform where users can go to sell things they don’t need (like OLX, where I work). If a user has an iPhone they want to sell — they go to this website, create a listing and sell their phone. On this platform, sellers sometimes have problems with identifying the correct category for the items they are selling. To help them, we want to build a service that suggests the best category. To sell their iPhone, the user creates a listing and the site needs to automatically understand that this iPhone has to go in the “mobile phones” category. Let’s start with the first role: product manager. Product Manager A product manager is someone responsible for developing products. Their goal is to make sure that the team is building the right thing. They are typically less technical than the rest of the team: they don’t focus on the implementation aspects of a problem, but rather the problem itself. Product managers need to ensure that the product is actually used by the end-users. This is a common problem: in many companies, engineers create something that doesn’t solve real problems. Therefore, the product manager is somebody who speaks to the team on behalf of the users. The primary skills a PM needs to have are communication skills. For data scientists, communication is a soft skill, but for a product manager — it’s a hard skill. They have to have it to perform their work. Product managers also do a lot of planning: they need to understand the problem, come up with a solution, and make sure the solution is implemented in a timely manner. To accomplish this, PMs need to know what’s important and plan the work accordingly. When somebody has a problem, they approach the PM with it. Then the task of the PM is to figure out if users actually need this feature, how important this feature is, and if the team has the capacity to implement it. Let’s come back to our example. Suppose somebody comes to the PM and says: “We want to build a feature to automatically suggest the category for a listing. Somebody’s selling an iPhone, and we want to create a service that predicts that the item goes in the mobile phones category.” Product managers need to answer these questions: “Is this feature that important to the user?” “Is it an important problem to solve in the product at all?” To answer these questions, PMs ask data analysts to help them figure out what to do next. Data Analyst Data analysts know how to analyze the data available in the company. They discover insights in the data and then explain their findings to others. So, analysts need to know: What kind of data the company has; How to get the data; How to interpret the results; How to explain their findings to colleagues and management. Data analysts are also often responsible for defining key metrics and building different dashboards. This includes things like showing the company’s profits, displaying the number of listings, or how many contacts buyers made with sellers. Thus, data analysts should know how to calculate all the important business metrics, and how to present them in a way that is understandable to others. When it comes to skills, data analysts should know: SQL — this is the main tool that they work with; Programming languages such as Python or R; Tableau or similar tools for building dashboards; Basics of statistics; How to run experiments; A bit of machine learning, such as regression analysis, and time series modeling. For our example, product managers turn to data analysts to help them quantify the extent of the problem. Together with the PM, the data analyst tries to answer questions like: “How many users are affected by this problem?” “How many users don’t finish creating their listing because of this problem?” “How many listings are there on the platform that don’t have the right category selected?” After the analyst gets the data, analyzes it and answers these questions, they may conclude: “Yes, this is actually a problem”. Then the PM and the team discuss the repost and agree: “Indeed, this problem is actually worth solving”. Now the data team will go ahead and start solving this problem. After the model for the service is created, it’s necessary to understand if the service is effective: whether this model helps people and solves the problem. For that, data analysts usually run experiments — usually, A/B tests. When running an experiment, we can see if more users successfully finish posting an item for sale or if there are fewer ads that end up in the wrong category. Data Scientist The roles of a data scientist and data analyst are pretty similar. In some companies, it’s the same person who does both jobs. However, data scientists typically focus more on predicting rather than explaining. A data analyst fetches the data, looks at it, explains what’s going on to the team, and gives some recommendations on what to do about it. A data scientist, on the other hand, focuses more on creating machine learning services. For example, one of the questions that a data scientist would want to answer is “How can we use this data to build a machine learning model for predicting something?” In other words, data scientists incorporate the data into the product. Their focus is more on engineering than analysis. Data scientists work more closely with engineers on integrating data solutions into the product. The skills of data scientists include: Machine learning — the main tool for building predictive services; Python — the primary programming language; SQL — necessary to fetch the data for training their models; Flask, Docker, and similar — to create simple web services for serving the models. For our example, the data scientists are the people who develop the model used for predicting the category. Once they have a model, they can develop a simple web service for hosting this model. Data Engineers Data engineers do all the heavy lifting when it comes to data. A lot of work needs to happen before data analysts can go to a database, fetch the data, perform their analysis, and come up with a report. This is precisely the focus of data engineers — they make sure this is possible. Their responsibility is to prepare all the necessary data in a form that is consumable for their colleagues. To accomplish this, data engineers create “a data lake”. All the data that users generate needs to be captured properly and saved in a separate database. This way, analysts can run their analysis, and data scientists can use this data for training models. Another thing data engineers often need to do, especially at larger companies, is to ensure that the people who look at the data have the necessary clearance to do so. Some user data is sensitive and people can’t just go looking around at personal information (such as emails or phone numbers) unless they have a really good reason to do so. Therefore, data engineers need to set up a system that doesn’t let people just access all the data at once. The skills needed for data engineers usually include: AWS or Google Cloud — popular cloud providers; Kubernetes and Terraform — infrastructure tools; Kafka or RabbitMQ — tools for capturing and processing the data; Databases — to save the data in such a way that it’s accessible for data analysts; Airflow or Luigi — data orchestration tools for building complex data pipelines. In our example, a data engineer prepares all the required data. First, they make sure the analyst has the data to perform the analysis. Then they also work with the data scientist to prepare the information that we’ll need for training the model. That includes the title of the listing, its description, the category, and so on. A data engineer isn’t the only type of engineer that a data team has. There are also machine learning engineers. Machine Learning Engineer Machine learning engineers take whatever data scientists build and help them scale it up. They also ensure that the service is maintainable and that the team follows the best engineering practices. Their focus is more on engineering than on modeling. The skills ML engineers have are similar to that of data engineers: AWS or Google Cloud; Infrastructure tools like Kubernetes and Terraform; Python and other programming languages; Flask, Docker, and other tools for creating web services. Additionally, ML engineers work closely with more “traditional” engineers, like backend engineers, frontend engineers, or mobile engineers, to ensure that the services from the data team are included in the final product. For our example, ML engineers work together with data scientists on productionizing the category suggestion services. They make sure it’s stable once it’s rolled out to all the users. They must also ensure that it’s maintainable and it’s possible to make changes to the service in the future. There’s another kind of engineer that can be pretty important in a data team — site reliability engineers. DevOps / Site Reliability Engineer The role of SREs is similar to the ML engineer, but the focus is more on the availability and reliability of the services. SREs aren’t strictly limited to working with data. Their role is more general: they tend to focus less on business logic and more on infrastructure, which includes things like networking and provisioning infrastructure. Therefore, SREs look after the servers where the services are running and take care of collecting all the operational metrics like CPU usage, how many requests per second there are, the services’ processes, and so on. As the name suggests, site reliability engineers have to make sure that everything runs reliably. They set up alerts and are constantly on call to make sure that the services are up and running without any interruptions. If something breaks, SREs quickly diagnose the problem and fix it, or involve an engineer to help find the solution. The skills needed for site reliability engineers: Cloud infrastructure tools; Programming languages like Python, Unix/Linux; Networking; Best DevOps practices like automation, CI/CD, and the like. Of course, ML engineers and data engineers should also know these best practices, but the focus of DevOps engineers/SREs is to establish them and make sure that they are followed. There is a special type of DevOps engineer, called “MLOps engineer”. MLOps Engineer An MLOps engineer is a DevOps engineer who also knows the basics of machine learning. Similar to an SRE, the responsibility of an MLOps Engineer is to make sure that the services, developed by data scientists, ML engineers, and data engineers, are up and running all the time. MLOps engineers know the lifecycle of a machine learning model: the training phase, serving phase, and so on. Despite having this knowledge, MLOps Engineers are still focused more on operational support than on anything else. This means that they need to know and follow all the DevOps practices and make sure that the rest of the team is following them as well. They accomplish this by setting up things like continuous retraining, and CI/CD pipelines. Even though everyone in the team has a different focus, they all work together on achieving the same goal: solve the problems of the users. Summary To summarize, the roles in the data team and their responsibilities are: Product managers — make sure that the team is building the right thing, act as a gateway to all the requests and speak on behalf of the users. Data analysts — analyze data, define key metrics, and create dashboards. Data scientists — build models and incorporate them into the product. Data engineers — prepare the data for analysts and data scientists. ML engineers — productionize machine learning services and establish the best engineering practices. Site reliability engineers — focus on availability, reliability, enforce the best DevOps practices. This list is not comprehensive, but it should be a good starting point if you are just getting into the industry, or if you just want to know how the lines between different roles are defined in the industry.

Read More

Spotlight

FirstWatch

Real-time & Mobile, FirstWatch helps manage Critical Data, anytime, anywhere! FirstWatch is a web based, real-time data analysis and Dashboard software system which allows authorized users to drill-down into Charts, Graphs and Maps featuring detailed statistical trends, patterns and geographic clusters of incident information, all based on user-defined criteria. FirstWatch is used everyday for Situational Awareness, Homeland Security, Public Health Surveillance, as well as for Operational and Performance monitoring / quality improvement for Public Safety (911, EMS, Fire & Law Enforcement) teams.

Events