Article | February 20, 2020
The world is now heading into the Fourth Industrial Revolution, as Professor Klaus Schwab, Founder and Executive Chairman of the World Economic Forum, described it in 2016. Artificial Intelligence (AI) is a key driver in this revolution and with it, machine learning is critical. But critical to the whole process is the need to process a tremendous amount of data which in turns boosts the demand for computing power exponentially.A study by OpenAI suggested that the computing power required for AI training surged by more than 300,000 times between 2012 and 2018. This represents a doubling of computing power every three months and two weeks; a number that is significantly quicker than Moore’s Law which has traditionally measured the time it takes to double computing power. Conventional methodology is no longer enough for such significant leaps, and we desperately need a different computing architecture to stay ahead in the game.
Article | February 20, 2020
Saurav Singla is a Senior Data Scientist, a Machine Learning Expert, an Author, a Technical Writer, a Data Science Course Creator and Instructor, a Mentor, a Speaker.
While Media 7 has followed Saurav Singla’s story closely, this chat with Saurav was about analytics, his journey as a data scientist, and what he brings to the table with his 15 years of extensive statistical modeling, machine learning, natural language processing, deep learning, and data analytics across Consumer Durable, Retail, Finance, Energy, Human Resource and Healthcare sectors. He has grown multiple businesses in the past and is still a researcher at heart.
In the past, Analytics and Predictive Modeling is predominant in few industries but in current times becoming an eminent part of emerging fields such as health, human resource management, pharma, IoT, and other smart solutions as well.
Saurav had worked in data science since 2003. Over the years, he realized that all the people they had hired — whether they are from business or engineering backgrounds — needed extensive training to be able to perform analytics on real-world business datasets.
He got an opportunity to move to Australia in the year 2003. He joined a retail company Harvey Norman in Australia, working out of their Melbourne office for four years.
After moving back to India, in 2008, he joined one of the verticals of Siemens — one of the few companies in India then using analytics services in-house for eight years.
He is a very passionate believer that the use of data and analytics will dramatically change not only corporations but also our societies. Building and expanding the application of analytics for supply chain, logistics, sales, marketing, finance at Siemens was a very fulfilling and enjoyable experience for him.
Siemens was a tremendously rewarding and enjoyable experience for him. He grew the team from zero to fifteen while he was the data scientist leader. He believes those eight years taught him how to think big, scale organizations using data science.
He has demonstrated success in developing and seamlessly executing plans in complex organizational structures. He has also been recognized for maximizing performance by implementing appropriate project management tools through analysis of details to ensure quality control and understanding of emerging technology.
In the year 2016, he started getting a serious inner push to start thinking about joining a consulting and shifted to a company based out in Delhi NCR.
During his ten-month path with them, he improved the way clients and businesses implement and exploit machine learning in their consumer commitments. As part of that vision, he developed class-defining applications that eliminate tension technologies, processes, and humans. Another main aspect of his plan was to ensure that it was affected in very fast agile cycles. Towards that he was actively innovating on operating and engagement models.
In the year 2017, he moved to London and joined a digital technology company, and assisted in building artificial intelligence and machine learning products for their clients. He aimed to solve problems and transform the costs using technology and machine learning. He was associated with them for 2 years.
At the beginning of the year 2018, he joined Mindrops. He developed advanced machine learning technologies and processes to solve client problems. Mentored the Data Science function and guide them in the development of the solution. He built robust clients Data Science capabilities which can be scalable across multiple business use cases.
Outside work, Saurav associated with Mentoring Club and Revive. He volunteers in his spare time for helping, coaching, and mentoring young people in taking up careers in the data science domain, data practitioners to build high-performing teams and grow the industry. He assists data science enthusiasts to stay motivated and guide them along their career path. He helps fill the knowledge gap and help aspirants understand the core of the industry. He helps aspirants analyze their progress and help them upskill accordingly. He also helps them connect with potential job opportunities with their industry-leading network.
Additionally, in the year 2018, he joined as a mentor in the Transaction Behavioral Intelligence company that accelerates business growth for banks with the use of Artificial Intelligence and Machine Learning enabled products. He is guiding their machine learning engineers with their projects. He is enhancing the capabilities of their AI-driven recommendation engine product.
Saurav is teaching the learners to grasp data science knowledge more engaging way by providing courses on the Udemy marketplace. He has created two courses on Udemy, with over twenty thousand students enrolled in it. He regularly speaks at meetups on data science topics and writes articles on data science topics in major publications such as AI Time Journal, Towards Data Science, Data Science Central, Kdnuggets, Data-Driven Investor, HackerNoon, and Infotech Report. He actively contributes academic research papers in machine learning, deep learning, natural language processing, statistics and artificial intelligence.
His book on Machine Learning for Finance was published by BPB Publications which is Asia's largest publisher of Computer and IT Books. This is possibly one of the biggest milestones of his career.
Saurav turned his passion to make knowledge available for society. Saurav believes sharing knowledge is cool, and he wishes everyone should have that passion for knowledge sharing. That would be his success.
BIG DATA MANAGEMENT
Article | February 20, 2020
The Internet of Things has been the hype in the past few years. It is set to play an important role in industries. Not only businesses but also consumers attempt to follow developments that come with the connected devices. Smart meters, sensors, and manufacturing equipment all can remodel the working system of companies.
Based on the Statista reports, the IoT market value of 248 billion US dollars in 2020 is expected to reach a worth of 1.6 Trillion USD by 2025. The global market is in the support of IoT development and its power to bring economic growth. But, the success of IoT without the integration of data analytics is impossible. This major growth component of IoT is the blend of IoT and Big Data - together known as IoT Data Analytics.
Understanding IoT Data Analytics
IoT Data Analytics is the analysis of large volumes of data that has been gathered from connected devices. As IoT devices generate a lot of data even in the shortest period, it becomes complex to analyze the enormous data volumes. Besides, the IoT data is quite similar to big data but has a major difference in their size and number of sources. To overcome the difficulty in IoT data integration, IoT data analytics is the best solution. With this combination, the process of data analysis becomes cost-effective, easier, and rapid.
Why Data Analytics and IoT Will Be Indispensable?
Data analytics is an important part of the success of IoT investments or applications. IoT along with Data analytics will allow businesses to make efficient use of datasets. How?
Let’s get into it!
Using data analytics in IoT investments businesses will become able to gain insight into customer behavior. It will lead to the crafting offers and services accordingly. As a result, companies will see a hike in their profits and revenue.
The vast amount of data sets that are being used by IoT applications needs to be organized and analyzed to obtain patterns. It can easily be achieved by using IoT analytics software.
In an era full of IoT devices and applications, the competition has also increased. You can gain a competitive advantage by hire developers that can help with the IoT analytics implementations. It will assist businesses in providing better services and stand out from the competition.
Now the next question arises: Where is it being implemented? Companies like Amazon, Microsoft, Siemens, VMware, and Huawei are using IoT data analytics for product usage analysis, sensor data analysis, camera data analysis, improved equipment maintenance, and optimizing operations.
The Rise of IoT Data Analytics
With the help of IoT Data Analytics, companies are ready to achieve more information that can be used to improve their overall performance and revenue. Although it has not reached every corner of the market yet, it is still being used for making the workplace more efficient and safe.
The ability to analyze and predict data in real-time is definitely a game-changer for companies that need all of their equipment to work efficiently all the time. It is continuously growing to provide insights that were never possible before.
BIG DATA MANAGEMENT
Article | February 20, 2020
In this article, we’ll talk about different roles in a data team and discuss their responsibilities.
In particular, we will cover:
The types of roles in a data team;
The responsibilities of each role;
The skills and knowledge each role needs to have.
This is not a comprehensive list and the majority of what you will read in this article is my opinion, which comes out of my experience from working as a data scientist.
You can interpret the following information as “the description of data roles from the perspective of a data scientist”. For example, my views on the role of a data engineer may be a bit simplified because I don’t see all the complexities of their work firsthand. I do hope you will find this information useful nonetheless.
Roles in a Team
A typical data team consists of the following roles:
Machine learning engineers, and
Site reliability engineers / MLOps engineers.
All these people work to create a data product.
To explain the core responsibilities of each role, we will use a case scenario:
Suppose we work at an online classifieds company. It’s a platform where users can go to sell things they don’t need (like OLX, where I work). If a user has an iPhone they want to sell — they go to this website, create a listing and sell their phone.
On this platform, sellers sometimes have problems with identifying the correct category for the items they are selling. To help them, we want to build a service that suggests the best category. To sell their iPhone, the user creates a listing and the site needs to automatically understand that this iPhone has to go in the “mobile phones” category.
Let’s start with the first role: product manager.
A product manager is someone responsible for developing products. Their goal is to make sure that the team is building the right thing. They are typically less technical than the rest of the team: they don’t focus on the implementation aspects of a problem, but rather the problem itself.
Product managers need to ensure that the product is actually used by the end-users. This is a common problem: in many companies, engineers create something that doesn’t solve real problems. Therefore, the product manager is somebody who speaks to the team on behalf of the users.
The primary skills a PM needs to have are communication skills. For data scientists, communication is a soft skill, but for a product manager — it’s a hard skill. They have to have it to perform their work.
Product managers also do a lot of planning: they need to understand the problem, come up with a solution, and make sure the solution is implemented in a timely manner. To accomplish this, PMs need to know what’s important and plan the work accordingly.
When somebody has a problem, they approach the PM with it. Then the task of the PM is to figure out if users actually need this feature, how important this feature is, and if the team has the capacity to implement it.
Let’s come back to our example. Suppose somebody comes to the PM and says:
“We want to build a feature to automatically suggest the category for a listing. Somebody’s selling an iPhone, and we want to create a service that predicts that the item goes in the mobile phones category.”
Product managers need to answer these questions:
“Is this feature that important to the user?”
“Is it an important problem to solve in the product at all?”
To answer these questions, PMs ask data analysts to help them figure out what to do next.
Data analysts know how to analyze the data available in the company. They discover insights in the data and then explain their findings to others.
So, analysts need to know:
What kind of data the company has;
How to get the data;
How to interpret the results;
How to explain their findings to colleagues and management.
Data analysts are also often responsible for defining key metrics and building different dashboards. This includes things like showing the company’s profits, displaying the number of listings, or how many contacts buyers made with sellers. Thus, data analysts should know how to calculate all the important business metrics, and how to present them in a way that is understandable to others.
When it comes to skills, data analysts should know:
SQL — this is the main tool that they work with;
Programming languages such as Python or R;
Tableau or similar tools for building dashboards;
Basics of statistics;
How to run experiments;
A bit of machine learning, such as regression analysis, and time series modeling.
For our example, product managers turn to data analysts to help them quantify the extent of the problem. Together with the PM, the data analyst tries to answer questions like:
“How many users are affected by this problem?”
“How many users don’t finish creating their listing because of this problem?”
“How many listings are there on the platform that don’t have the right category selected?”
After the analyst gets the data, analyzes it and answers these questions, they may conclude: “Yes, this is actually a problem”. Then the PM and the team discuss the repost and agree: “Indeed, this problem is actually worth solving”.
Now the data team will go ahead and start solving this problem.
After the model for the service is created, it’s necessary to understand if the service is effective: whether this model helps people and solves the problem. For that, data analysts usually run experiments — usually, A/B tests.
When running an experiment, we can see if more users successfully finish posting an item for sale or if there are fewer ads that end up in the wrong category.
The roles of a data scientist and data analyst are pretty similar. In some companies, it’s the same person who does both jobs. However, data scientists typically focus more on predicting rather than explaining.
A data analyst fetches the data, looks at it, explains what’s going on to the team, and gives some recommendations on what to do about it. A data scientist, on the other hand, focuses more on creating machine learning services. For example, one of the questions that a data scientist would want to answer is “How can we use this data to build a machine learning model for predicting something?”
In other words, data scientists incorporate the data into the product. Their focus is more on engineering than analysis. Data scientists work more closely with engineers on integrating data solutions into the product.
The skills of data scientists include:
Machine learning — the main tool for building predictive services;
Python — the primary programming language;
SQL — necessary to fetch the data for training their models;
Flask, Docker, and similar — to create simple web services for serving the models.
For our example, the data scientists are the people who develop the model used for predicting the category. Once they have a model, they can develop a simple web service for hosting this model.
Data engineers do all the heavy lifting when it comes to data. A lot of work needs to happen before data analysts can go to a database, fetch the data, perform their analysis, and come up with a report. This is precisely the focus of data engineers — they make sure this is possible. Their responsibility is to prepare all the necessary data in a form that is consumable for their colleagues.
To accomplish this, data engineers create “a data lake”. All the data that users generate needs to be captured properly and saved in a separate database. This way, analysts can run their analysis, and data scientists can use this data for training models.
Another thing data engineers often need to do, especially at larger companies, is to ensure that the people who look at the data have the necessary clearance to do so. Some user data is sensitive and people can’t just go looking around at personal information (such as emails or phone numbers) unless they have a really good reason to do so. Therefore, data engineers need to set up a system that doesn’t let people just access all the data at once.
The skills needed for data engineers usually include:
AWS or Google Cloud — popular cloud providers;
Kubernetes and Terraform — infrastructure tools;
Kafka or RabbitMQ — tools for capturing and processing the data;
Databases — to save the data in such a way that it’s accessible for data analysts;
Airflow or Luigi — data orchestration tools for building complex data pipelines.
In our example, a data engineer prepares all the required data. First, they make sure the analyst has the data to perform the analysis. Then they also work with the data scientist to prepare the information that we’ll need for training the model. That includes the title of the listing, its description, the category, and so on.
A data engineer isn’t the only type of engineer that a data team has. There are also machine learning engineers.
Machine Learning Engineer
Machine learning engineers take whatever data scientists build and help them scale it up. They also ensure that the service is maintainable and that the team follows the best engineering practices. Their focus is more on engineering than on modeling.
The skills ML engineers have are similar to that of data engineers:
AWS or Google Cloud;
Infrastructure tools like Kubernetes and Terraform;
Python and other programming languages;
Flask, Docker, and other tools for creating web services.
Additionally, ML engineers work closely with more “traditional” engineers, like backend engineers, frontend engineers, or mobile engineers, to ensure that the services from the data team are included in the final product.
For our example, ML engineers work together with data scientists on productionizing the category suggestion services. They make sure it’s stable once it’s rolled out to all the users. They must also ensure that it’s maintainable and it’s possible to make changes to the service in the future.
There’s another kind of engineer that can be pretty important in a data team — site reliability engineers.
DevOps / Site Reliability Engineer
The role of SREs is similar to the ML engineer, but the focus is more on the availability and reliability of the services.
SREs aren’t strictly limited to working with data. Their role is more general: they tend to focus less on business logic and more on infrastructure, which includes things like networking and provisioning infrastructure.
Therefore, SREs look after the servers where the services are running and take care of collecting all the operational metrics like CPU usage, how many requests per second there are, the services’ processes, and so on.
As the name suggests, site reliability engineers have to make sure that everything runs reliably. They set up alerts and are constantly on call to make sure that the services are up and running without any interruptions. If something breaks, SREs quickly diagnose the problem and fix it, or involve an engineer to help find the solution.
The skills needed for site reliability engineers:
Cloud infrastructure tools;
Programming languages like Python,
Best DevOps practices like automation, CI/CD, and the like.
Of course, ML engineers and data engineers should also know these best practices, but the focus of DevOps engineers/SREs is to establish them and make sure that they are followed.
There is a special type of DevOps engineer, called “MLOps engineer”.
An MLOps engineer is a DevOps engineer who also knows the basics of machine learning. Similar to an SRE, the responsibility of an MLOps Engineer is to make sure that the services, developed by data scientists, ML engineers, and data engineers, are up and running all the time.
MLOps engineers know the lifecycle of a machine learning model: the training phase, serving phase, and so on.
Despite having this knowledge, MLOps Engineers are still focused more on operational support than on anything else. This means that they need to know and follow all the DevOps practices and make sure that the rest of the team is following them as well. They accomplish this by setting up things like continuous retraining, and CI/CD pipelines.
Even though everyone in the team has a different focus, they all work together on achieving the same goal: solve the problems of the users.
To summarize, the roles in the data team and their responsibilities are:
Product managers — make sure that the team is building the right thing, act as a gateway to all the requests and speak on behalf of the users.
Data analysts — analyze data, define key metrics, and create dashboards.
Data scientists — build models and incorporate them into the product.
Data engineers — prepare the data for analysts and data scientists.
ML engineers — productionize machine learning services and establish the best engineering practices.
Site reliability engineers — focus on availability, reliability, enforce the best DevOps practices.
This list is not comprehensive, but it should be a good starting point if you are just getting into the industry, or if you just want to know how the lines between different roles are defined in the industry.