Article | September 7, 2021
Data has settled into regular business practices. Executives in every industry are looking for ways to optimize processes through the implementation of data. Doing business without analytics is just shooting yourself in the foot.
Yet, global business efforts to embrace data-transformation haven't had resounding success. There are many reasons for the challenging course, however, people and process management has been cited as the common thread.
A combination of people touting data as the “new oil” and everyone scrambling to obtain business intelligence has led to information being considered an end in itself. While the idea of becoming a data-driven organization is extremely beneficial, the execution is often lacking. In some areas of business, action over strategy can bring tremendous results.
However, in data governance such an approach often results in a hectic period of implementations, new processes, and uncoordinated decision-making. What I propose is to proceed with a good strategy and sound data governance principles in mind.
Auditing data for quality
Within a data governance framework, information turns into an asset. Proper data governance is essentially informational accounting. There are numerous rules, regulations, and guidelines to make governance ensure quality.
While boiling down the process into one concept would be reductionist, by far the most important topic in all information management and governance is data quality. Data quality can be loosely defined as the degree to which data is accurate, complete, timely, consistent, adherent to rules and requirements, and relevant.
Generally, knowledge workers (i.e. those who are heavily involved in data) have an intuitive grasp of when data quality is lacking. However, pinpointing the problem should be the goal. Only if the root cause, which is generally behavioral or process-based rather than technical, of the issue is discovered can the problem be resolved.
Lack of consistent data quality assurance leads to the same result with varying degrees of terribleness - decision making based on inaccurate information. For example, mismanaging company inventory is most often due to lack of data quality. Absence of data governance is all cost and no benefit. In the coming years, the threat of a lack of quality assurance will only increase as more businesses try to take advantage of data of any kind.
Luckily, data governance is becoming a more well-known phenomenon. According to a survey we conducted with Censuswide, nearly 50% of companies in the financial sector have put data quality assurement as part of their overall data strategy for the coming year.
Data governance prerequisites
Information management used to be thought of as an enterprise-level practice. While that still rings true in many cases today, overall data load within companies has significantly risen in the past few years. With the proliferation of data-as-a-service companies and overall improvement in information acquisition, medium-size enterprises can now derive beneficial results from implementing data governance if they are within a data-heavy field.
However, data governance programs will differ according to several factors. Each of these will influence the complexity of the strategy:
Business model - the type of organization, its hierarchy, industry, and daily activities.
Content - the volume, type (e.g. internal and external data, general information, documents, etc.) and location of content being governed.
Federation - the extent and intensity of governance.
Smaller businesses will barely have to think about the business model as they will usually have only one. Multinational corporations, on other hand, might have several branches and arms of action, necessitating different data governance strategies for each.
However, the hardest prerequisite for data governance is proving its efficacy beforehand. Since the process itself deals with abstract concepts (e.g. data as an asset, procedural efficiency), often only platitudes of “improved performance” and “reduced operating costs” will be available as arguments. Regardless of the distinct data governance strategy implemented, the effects become visible much later down the line. Even then, for people who have an aversion to data, the effects might be nearly invisible.
Therefore, while improved business performance and efficiency is a direct result of proper data governance, making the case for implementing such a strategy is easiest through risk reduction. Proper management of data results in easier compliance with laws and regulations, reduced data breach risk, and better decision making due to more streamlined access to information.
“Why even bother?”
Data governance is difficult, messy, and, sometimes, brutal. After all, most bad data is created out of human behavior, not technical error. That means telling people they’re doing something wrong (through habit or semi-intentional action). Proving someone wrong, at times repeatedly, is bound to ruffle some feathers.
Going to a social war for data might seem like overkill. However, proper data governance prevents numerous invisible costs and opens up avenues for growth. Without it, there’s an increased likelihood of:
Costs associated with data. Lack of consistent quality control can lead to the derivation of unrealistic conclusions. Noticing these has costs as retracing steps and fixing the root cause takes a considerable amount of time. Not noticing these can cause invisible financial sinks.
Costs associated with opportunity. All data can deliver insight. However, messy, inaccurate, or low-quality data has its potential significantly reduced. Some insights may simply be invisible if a business can’t keep up with quality.
As data governance is associated with an improvement in nearly all aspects of the organization, its importance cannot be overstated. However, getting everyone on board and keeping them there throughout the implementation will be painful. Delivering carefully crafted cost-benefit and risk analyses of such a project will be the initial step in nearly all cases.
Luckily, an end goal to all data governance programs is to disappear. As long as the required practices and behaviors remain, data quality can be maintained. Eventually, no one will even notice they’re doing something they may have considered “out of the ordinary” previously.
THEORY AND STRATEGIES
Article | September 7, 2021
Since the internet became popular, the way we purchase things has evolved from a simple process to a more complicated process. Unlike traditional shopping, it is not possible to experience the products first-hand when purchasing online. Not only this, but there are more options or variants in a single product than ever before, which makes it more challenging to decide.
To not make a bad investment, the consumer has to rely heavily on the customer reviews posted by people who are using the product. However, sorting through relevant reviews at multiple eCommerce platforms of different products and then comparing them to choose can work too much. To provide a solution to this problem, Amazon has come up with sentiment analysis using product review data. Amazon performs sentiment analysis on product review data with Artificial Intelligence technology to develop the best suitable products for the customer. This technology enables Amazon to create products that are most likely to be ideal for the customer.
A consumer wants to search for only relevant and useful reviews when deciding on a product. A rating system is an excellent way to determine the quality and efficiency of a product. However, it still cannot provide complete information about the product as ratings can be biased. Textual detailed reviews are necessary to improve the consumer experience and in helping them make informed choices. Consumer experience is a vital tool to understand the customer's behavior and increase sales.
Amazon has come up with a unique way to make things easier for their customers. They do not promote products that look similar to the other customer's search history. Instead, they recommend products that are similar to the product a user is searching for. This way, they guide the customer using the correlation between the products.
To understand this concept better, we must understand how Amazon's recommendation algorithm has upgraded with time.
The history of Amazon's recommendation algorithm
Before Amazon started a sentiment analysis of customer product reviews using machine learning, they used the same collaborative filtering to make recommendations. Collaborative filtering is the most used way to recommend products online. Earlier, people used user-based collaborative filtering, which was not suitable as there were many uncounted factors.
Researchers at Amazon came up with a better way to recommend products that depend on the correlation between products instead of similarities between customers. In user-based collaborative filtering, a customer would be shown recommendations based on people's purchase history with similar search history. In item-to-item collaborative filtering, people are shown recommendations of similar products to their recent purchase history. For example, if a person bought a mobile phone, he will be shown hints of that phone's accessories.
Amazon's Personalization team found that using purchase history at a product level can provide better recommendations. This way of filtering also offered a better computational advantage. User-based collaborative filtering requires analyzing several users that have similar shopping history. This process is time-consuming as there are several demographic factors to consider, such as location, gender, age, etc. Also, a customer's shopping history can change in a day. To keep the data relevant, you would have to update the index storing the shopping history daily.
However, item-to-item collaborative filtering is easy to maintain as only a tiny subset of the website's customers purchase a specific product. Computing a list of individuals who bought a particular item is much easier than analyzing all the site's customers for similar shopping history. However, there is a proper science between calculating the relatedness of a product. You cannot merely count the number of times a person bought two items together, as that would not make accurate recommendations.
Amazon research uses a relatedness metric to come up with recommendations. If a person purchased an item X, then the item Y will only be related to the person if purchasers of item X are more likely to buy item Y. If users who purchased the item X are more likely to purchase the item Y, then only it is considered to be an accurate recommendation.
In order to provide a good recommendation to a customer, you must show products that have a higher chance of being relevant. There are countless products on Amazon's marketplace, and the customer will not go through several of them to figure out the best one. Eventually, the customer will become frustrated with thousands of options and choose to try a different platform. So Amazon has to develop a unique and efficient way to recommend the products that work better than its competition.
User-based collaborative filtering was working fine until the competition increased. As the product listing has increased in the marketplace, you cannot merely rely on previous working algorithms. There are more filters and factors to consider than there were before. Item-to-item collaborative filtering is much more efficient as it automatically filters out products that are likely to be purchased. This limits the factors that require analysis to provide useful recommendations.
Amazon has grown into the biggest marketplace in the industry as customers trust and rely on its service. They frequently make changes to fit the recent trends and provide the best customer experience possible.
Article | September 7, 2021
Nowadays, everyone with some technical expertise and a data science bootcamp under their belt calls themselves a data scientist. Also, most managers don't know enough about the field to distinguish an actual data scientist from a make-believe one someone who calls themselves a data science professional today but may work as a cab driver next year. As data science is a very responsible field dealing with complex problems that require serious attention and work, the data scientist role has never been more significant. So, perhaps instead of arguing about which programming language or which all-in-one solution is the best one, we should focus on something more fundamental. More specifically, the thinking process of a data scientist.
The challenges of the Data Science professional
Any data science professional, regardless of his specialization, faces certain challenges in his day-to-day work. The most important of these involves decisions regarding how he goes about his work. He may have planned to use a particular model for his predictions or that model may not yield adequate performance (e.g., not high enough accuracy or too high computational cost, among other issues). What should he do then? Also, it could be that the data doesn't have a strong enough signal, and last time I checked, there wasn't a fool-proof method on any data science programming library that provided a clear-cut view on this matter. These are calls that the data scientist has to make and shoulder all the responsibility that goes with them.
Why Data Science automation often fails
Then there is the matter of automation of data science tasks. Although the idea sounds promising, it's probably the most challenging task in a data science pipeline. It's not unfeasible, but it takes a lot of work and a lot of expertise that's usually impossible to find in a single data scientist. Often, you need to combine the work of data engineers, software developers, data scientists, and even data modelers. Since most organizations don't have all that expertise or don't know how to manage it effectively, automation doesn't happen as they envision, resulting in a large part of the data science pipeline needing to be done manually.
The Data Science mindset overall
The data science mindset is the thinking process of the data scientist, the operating system of her mind. Without it, she can't do her work properly, in the large variety of circumstances she may find herself in. It's her mindset that organizes her know-how and helps her find solutions to the complex problems she encounters, whether it is wrangling data, building and testing a model or deploying the model on the cloud. This mindset is her strategy potential, the think tank within, which enables her to make the tough calls she often needs to make for the data science projects to move forward.
Specific aspects of the Data Science mindset
Of course, the data science mindset is more than a general thing. It involves specific components, such as specialized know-how, tools that are compatible with each other and relevant to the task at hand, a deep understanding of the methodologies used in data science work, problem-solving skills, and most importantly, communication abilities. The latter involves both the data scientist expressing himself clearly and also him understanding what the stakeholders need and expect of him. Naturally, the data science mindset also includes organizational skills (project management), the ability to work well with other professionals (even those not directly related to data science), and the ability to come up with creative approaches to the problem at hand.
The Data Science process
The data science process/pipeline is a distillation of data science work in a comprehensible manner. It's particularly useful for understanding the various stages of a data science project and help plan accordingly. You can view one version of it in Fig. 1 below. If the data science mindset is one's ability to navigate the data science landscape, the data science process is a map of that landscape. It's not 100% accurate but good enough to help you gain perspective if you feel overwhelmed or need to get a better grip on the bigger picture.
Learning more about the topic
Naturally, it's impossible to exhaust this topic in a single article (or even a series of articles). The material I've gathered on it can fill a book! If you are interested in such a book, feel free to check out the one I put together a few years back; it's called Data Science Mindset, Methodologies, and Misconceptions and it's geared both towards data scientist, data science learners, and people involved in data science work in some way (e.g. project leaders or data analysts). Check it out when you have a moment. Cheers!
Article | September 7, 2021
Today, the world is all about industry 4.0 and the technologies brought in by it. From Artificial Intelligence (AI) to Big Data Analytics, all technologies are transforming one or the other industries in some ways. AI-powered Cognitive Computing is one such technology that provides high scale automation with ubiquitous connectivity. More so, it is redefining how IoT technology operates.The need for Cognitive computing in the IoT emerges from the significance of information in present-day business. In the brilliant IoT settings of things to come. Everybody from new AI services companies to undertakings to use the information to settle on choices utilizing realities instead of impulses.Cognitive computing uses information and reacts to changes inside it to decide on better options. It is based on explicit gaining from past encounters, contrasted and a standard-based choice framework