The Top 5 Data Preparation Challenges to Get Big Data Pipelines to Run in Production

RAMESH MENON | March 30, 2019

article image
Many organizations have been looking to big data to drive game-changing business insights and operational agility, but big data has turned out to be so complex and costly to configure, deploy and manage, most data projects never make it into production. The agility businesses require today means being able to constantly add new analytics use cases, but so much of the resources available are consumed by the sheer ongoing maintenance of the pipelines that were already built. So what can organizations do to overcome the problem? After data is ingested into a data lake, data engineers need to transform this data in preparation for downstream use by business analysts and data scientists. Challenges in data preparation tend to be a collection of issues that add up over time to create ongoing maintenance and management issues.

Spotlight

Io-Tahoe

Io-Tahoe is a smart data discovery and AI-driven catalog product that enables enterprises to accelerate to next-generation data management practices, radically improving data governance and regulatory compliance while driving significant advancements in business analytics and technological transformation.

OTHER ARTICLES

Predictive Analytics: Enabling Businesses Achieve Accurate Data Prediction using AI

Article | July 13, 2021

We are living in the age of Big Data, and data has become the heart and the most valuable asset for businesses across industry verticals. In the hyper-competitive market that exists today, data acts as a major contributor to achieving business intelligence and brand equity. Thus, effective data management is the key to accelerating the success of businesses. For effective data management to take place, organizations must ensure that the data that is used is accurate and reliable. With the advent of AI, businesses can now leverage machine learning to predict outcomes using historical data. This is called predictive analytics. With predictive analytics, organizations can predict anything from customer turnover to forecasting equipment maintenance. Moreover, the data that is acquired through predictive analytics is of high quality and very accurate. Let us take a look at how AI enables accurate data prediction and helps businesses to equip themselves for the digital future.

Read More

Machine Learning and AI is Supercharging the Modern Technology

Article | April 6, 2020

Today when we look around, we see how technology has revolutionized our world. It has created amazing elements and resources, putting useful intelligence at our fingertips. With all of these revolutions, technology has also made our lives easier, faster, digital and fun. Perhaps at a point when we are talking about technology, Machine learning and artificial intelligence are increasingly popular buzzwords used in modern terms.Machine Learning has proven to be one of the game changer technological advancements of the past decade. In the increasingly competitive corporate world, Machine learning is enabling companies to fast-track digital transformation and move into an age of automation. Some might even argue that AI/ML is required to stay relevant in some verticals, such as digital payments and fraud detection in banking or product recommendations.To understand what machine learning is, it is important to know the concepts of artificial intelligence (AI). It is defined as a program that exhibits cognitive ability similar to that of a human being. Making computers think like humans and solve problems the way we do is one of the main tenets of artificial intelligence.

Read More

Natural Language Desiderata: Understanding, explaining and interpreting a model.

Article | May 3, 2021

Clear conceptualization, taxonomies, categories, criteria, properties when solving complex real-life contextualized problems is non-negotiable, a “must” to unveil the hidden potential of NPL impacting on the transparency of a model. It is common knowledge that many authors and researchers in the field of natural language processing (NLP) and machine learning (ML) are prone to use explainability and interpretability interchangeably, which from the start constitutes a fallacy. They do not mean the same, even when looking for a definition from different perspectives. A formal definition of what explanation, explainable, explainability mean can be traced to social science, psychology, hermeneutics, philosophy, physics and biology. In The Nature of Explanation, Craik (1967:7) states that “explanations are not purely subjective things; they win general approval or have to be withdrawn in the face of evidence or criticism.” Moreover, the power of explanation means the power of insight and anticipation and why one explanation is satisfactory involves a prior question why any explanation at all should be satisfactory or in machine learning terminology how a model is performant in different contextual situations. Besides its utilitarian value, that impulse to resolve a problem whether or not (in the end) there is a practical application and which will be verified or disapproved in the course of time, explanations should be “meaningful”. We come across explanations every day. Perhaps the most common are reason-giving ones. Before advancing in the realm of ExNLP, it is crucial to conceptualize what constitutes an explanation. Miller (2017) considered explanations as “social interactions between the explainer and explainee”, therefore the social context has a significant impact in the actual content of an explanation. Explanations in general terms, seek to answer the why type of question. There is a need for justification. According to Bengtsson (2003) “we will accept an explanation when we feel satisfied that the explanans reaches what we already hold to be true of the explanandum”, (being the explanandum a statement that describes the phenomenon to be explained (it is a description, not the phenomenon itself) and the explanan at least two sets of statements, used for the purpose of elucidating the phenomenon). In discourse theory (my approach), it is important to highlight that there is a correlation between understanding and explanation, first and foremost. Both are articulated although they belong to different paradigmatic fields. This dichotomous pair is perceived as a duality, which represents an irreducible form of intelligibility. When there are observable external facts subject to empirical validation, systematicity, subordination to hypothetic procedures then we can say that we explain. An explanation is inscribed in the analytical domain, the realm of rules, laws and structures. When we explain we display propositions and meaning. But we do not explain in a vacuum. The contextual situation permeates the content of an explanation, in other words, explanation is an epistemic activity: it can only relate things described or conceptualized in a certain way. Explanations are answers to questions in the form: why fact, which most authors agree upon. Understanding can mean a number of things in different contexts. According to Ricoeur “understanding precedes, accompanies and swathes an explanation, and an explanation analytically develops understanding.” Following this line of thought, when we understand we grasp or perceive the chain of partial senses as a whole in a single act of synthesis. Originally, belonging to the field of the so-called human science, then, understanding refers to a circular process and it is directed to the intentional unit of discourse whereas an explanation is oriented to the analytical structure of a discourse. Now, to ground any discussion on what interpretation is, it is crucial to highlight that the concept of interpretation opposes the concept of explanation. They cannot be used interchangeably. If considered as a unit, they composed what is called une combinaison éprouvé (a contrasted dichotomy). Besides, in dissecting both definitions we will see that the agent that performs the explanation differs from the one that produce the interpretation. At present there is a challenge of defining—and evaluating—what constitutes a quality interpretation. Linguistically speaking, “interpretation” is the complete process that encompasses understanding and explanation. It is true that there is more than one way to interprete an explanation (and then, an explanation of a prediction) but it is also true that there is a limited number of possible explanations if not a unique one since they are contextualized. And it is also true that an interpretation must not only be plausible, but more plausible than another interpretation. Of course there are certain criteria to solve this conflict. And to prove that an interpretation is more plausible based on an explanation or the knowledge could be related to the logic of validation rather than to the logic of subjective probability. Narrowing it down How are these concepts transferred from theory to praxis? What is the importance of the "interpretability" of an explainable model? What do we call a "good" explainable model? What constitutes a "good explanation"? These are some of the many questions that researchers from both academia and industry are still trying to answer. In the realm on machine learning current approaches conceptualize interpretation in a rather ad-hoc manner, motivated by practical use cases and applications. Some suggest model interpretability as a remedy, but only a few are able to articulate precisely what interpretability means or why it is important. Hence more, most in the research community and industry use this term as synonym of explainability, which is certainly not. They are not overlapping terms. Needless to say, in most cases technical descriptions of interpretable models are diverse and occasionally discordant. A model is better interpretable than another model if its decisions are easier for a human to comprehend than decisions from the other model (Molnar, 2021). For a model to be interpretable (being interpretable the quality of the model), the information conferred by an interpretation may be useful. Thus, one purpose of interpretations may be to convey useful information of any kind. In Molnar’s words the higher the interpretability of a machine learning model, the easier it is for someone to comprehend why certain decisions or predictions have been made.” I will make an observation here and add “the higher the interpretability of an explainable machine learning model”. Luo et. al. (2021) defines “interpretability as ‘the ability [of a model] to explain or to present [its predictions] in understandable terms to a human.” Notice that in this definition the author includes “understanding” as part of the definition, giving the idea of completeness. Thus, the triadic closure explanation-understanding-interpretation is fulfilled, in which the explainer and interpretant (the agents) belong to different instances and where interpretation allows the extraction and formation of additional knowledge captured by the explainable model. Now are the models inherently interpretable? Well, it is more a matter of selecting the methods of achieving interpretability: by (a) interpreting existing models via post-hoc techniques, or (b) designing inherently interpretable models, which claim to provide more faithful interpretations than post-hoc interpretation of blackbox models. The difference also lies in the agency –like I said before– , and how in one case interpretation may affect the explanation process, that is model’s inner working or just include natural language explanations of learned representations or models.

Read More

HOW THE CORONAVIRUS (COVID-19) MIGHT BE STOPPED BY DATA SCIENCE

Article | March 16, 2020

We know that data and analytics play a role in everyday products from recommendations on what music we might like to hear to automated re-routing by our GPS system. But how might the power of analytics be brought to bear on a disease that is currently threatening the health and economic welfare of people across the globe?If we rewind the clock to the 1850s, there are two significant examples of how early pioneers in data science made incredible impacts on the world that can provide some insight into what we might see happen next.

Read More

Spotlight

Io-Tahoe

Io-Tahoe is a smart data discovery and AI-driven catalog product that enables enterprises to accelerate to next-generation data management practices, radically improving data governance and regulatory compliance while driving significant advancements in business analytics and technological transformation.

Events