Article | July 19, 2021
In an era of big data, data health has become a pressing issue when more and more data is being stored and processed. Therefore, preserving the integrity of the collected data is becoming increasingly necessary. Understanding the fundamentals of data integrity and how it works is the first step in safeguarding the data.
Data integrity is essential for the smooth running of a company. If a company’s data is altered, deleted, or changed, and if there is no way of knowing how it can have significant impact on any data-driven business decisions.
Data integrity is the reliability and trustworthiness of data throughout its lifecycle. It is the overall accuracy, completeness, and consistency of data. It can be indicated by lack of alteration between two updates of a data record, which means data is unchanged or intact. Data integrity refers to the safety of data regarding regulatory compliance- like GDPR compliance- and security. A collection of processes, rules, and standards implemented during the design phase maintains the safety and security of data.
The information stored in the database will remain secure, complete, and reliable no matter how long it’s been stored; that’s when you know that the integrity of data is safe. A data integrity framework also ensures that no outside forces are harming this data.
This term of data integrity may refer to either the state or a process. As a state, the data integrity framework defines a data set that is valid and accurate. Whereas as a process, it describes measures used to ensure validity and accuracy of data set or all data contained in a database or a construct.
Data integrity can be enforced at both physical and logical levels. Let us understand the fundamentals of data integrity in detail:
Types of Data Integrity
There are two types of data integrity: physical and logical. They are collections of processes and methods that enforce data integrity in both hierarchical and relational databases.
Physical integrity protects the wholeness and accuracy of that data as it’s stored and retrieved. It refers to the process of storage and collection of data most accurately while maintaining the accuracy and reliability of data. The physical level of data integrity includes protecting data against different external forces like power cuts, data breaches, unexpected catastrophes, human-caused damages, and more.
Logical integrity keeps the data unchanged as it’s used in different ways in a relational database. Logical integrity checks data accuracy in a particular context. The logical integrity is compromised when errors from a human operator happen while entering data manually into the database. Other causes for compromised integrity of data include bugs, malware, and transferring data from one site within the database to another in the absence of some fields.
There are four types of logical integrity:
A database has columns, rows, and tables. These elements need to be as numerous as required for the data to be accurate, but no more than necessary. Entity integrity relies on the primary key, the unique values that identify pieces of data, making sure the data is listed just once and not more to avoid a null field in the table. The feature of relational systems that store data in tables can be linked and utilized in different ways.
Referential integrity means a series of processes that ensure storage and uniform use of data. The database structure has rules embedded into them about the usage of foreign keys and ensures only proper changes, additions, or deletions of data occur. These rules can include limitations eliminating duplicate data entry, accurate data guarantee, and disallowance of data entry that doesn’t apply. Foreign keys relate data that can be shared or null. For example, let’s take a data integrity example, employees that share the same work or work in the same department.
Domain Integrity can be defined as a collection of processes ensuring the accuracy of each piece of data in a domain. A domain is a set of acceptable values a column is allowed to contain. It includes constraints that limit the format, type, and amount of data entered. In domain integrity, all values and categories are set. All categories and values in a database are set, including the nulls.
This type of logical integrity involves the user's constraints and rules to fit their specific requirements. The data isn’t always secure with entity, referential, or domain integrity. For example, if an employer creates a column to input corrective actions of the employees, this data would fall under user-defined integrity.
Difference between Data Integrity and Data Security
Often, the terms data security and data integrity get muddled and are used interchangeably. As a result, the term is incorrectly substituted for data integrity, but each term has a significant meaning.
Data integrity and data security play an essential role in the success of each other. Data security means protecting data against unauthorized access or breach and is necessary to ensure data integrity.
Data integrity is the result of successful data security. However, the term only refers to the validity and accuracy of data rather than the actual act of protecting data. Data security is one of the many ways to maintain data integrity. Data security focuses on reducing the risk of leaking intellectual property, business documents, healthcare data, emails, trade secrets, and more. Some facets of data security tactics include permissions management, data classification, identity, access management, threat detection, and security analytics.
For modern enterprises, data integrity is necessary for accurate and efficient business processes and to make well-intentioned decisions. Data integrity is critical yet manageable for organizations today by backup and replication processes, database integrity constraints, validation processes, and other system protocols through varied data protection methods.
Threats to Data Integrity
Data integrity can be compromised by human error or any malicious acts. Accidental data alteration during the transfer from one device to another can be compromised. There is an assortment of factors that can affect the integrity of the data stored in databases. Following are a few of the examples:
Data integrity is put in jeopardy when individuals enter information incorrectly, duplicate, or delete data, don’t follow the correct protocols, or make mistakes in implementing procedures to protect data.
A transfer error occurs when data is incorrectly transferred from one location in a database to another. This error also happens when a piece of data is present in the destination table but not in the source table in a relational database.
Bugs and Viruses
Data can be stolen, altered, or deleted by spyware, malware, or any viruses.
Hardware gets compromised when a computer crashes, a server gets down, or problems with any computer malfunctions. Data can be rendered incorrectly or incompletely, limit, or eliminate data access when hardware gets compromised.
Preserving Data Integrity
Companies make decisions based on data. If that data is compromised or incorrect, it could harm that company to a great extent. They routinely make data-driven business decisions, and without data integrity, those decisions can have a significant impact on the company’s goals.
The threats mentioned above highlight a part of data security that can help preserve data integrity. Minimize the risk to your organization by using the following checklist:
Require an input validation when your data set is supplied by a known or an unknown source (an end-user, another application, a malicious user, or any number of other sources). The data should be validated and verified to ensure the correct input.
Verifying data processes haven’t been corrupted is highly critical. Identify key specifications and attributes that are necessary for your organization before you validate the data.
Eliminate Duplicate Data
Sensitive data from a secure database can easily be found on a document, spreadsheet, email, or shared folders where employees can see it without proper access. Therefore, it is sensible to clean up stray data and remove duplicates.
Data backups are a critical process in addition to removing duplicates and ensuring data security. Permanent loss of data can be avoided by backing up all necessary information, and it goes a long way. Back up the data as much as possible as it is critical as organizations may get attacked by ransomware.
Another vital data security practice is access control. Individuals in an organization with any wrong intent can harm the data. Implement a model where users who need access can get access is also a successful form of access control. Sensitive servers should be isolated and bolted to the floor, with individuals with an access key are allowed to use them.
Keep an Audit Trail
In case of a data breach, an audit trail will help you track down your source. In addition, it serves as breadcrumbs to locate and pinpoint the individual and origin of the breach.
Data collection was difficult not too long ago. It is no longer an issue these days. With the amount of data being collected these days, we must maintain the integrity of the data. Organizations can thus make data-driven decisions confidently and take the company ahead in a proper direction.
Frequently Asked Questions
What are integrity rules?
Precise data integrity rules are short statements about constraints that need to be applied or actions that need to be taken on the data when entering the data resource or while in the data resource. For example, precise data integrity rules do not state or enforce accuracy, precision, scale, or resolution.
What is a data integrity example?
Data integrity is the overall accuracy, completeness, and consistency of data. A few examples where data integrity is compromised are:
• When a user tries to enter a date outside an acceptable range
• When a user tries to enter a phone number in the wrong format
• When a bug in an application attempts to delete the wrong record
What are the principles of data integrity?
The principles of data integrity are attributable, legible, contemporaneous, original, and accurate. These simple principles need to be part of a data life cycle, GDP, and data integrity initiatives.
"name": "What are integrity rules?",
"text": "Precise data integrity rules are short statements about constraints that need to be applied or actions that need to be taken on the data when entering the data resource or while in the data resource. For example, precise data integrity rules do not state or enforce accuracy, precision, scale, or resolution."
"name": "What is a data integrity example?",
"text": "Data integrity is the overall accuracy, completeness, and consistency of data. A few examples where data integrity is compromised are:
When a user tries to enter a date outside an acceptable range
When a user tries to enter a phone number in the wrong format
When a bug in an application attempts to delete the wrong record"
"name": "What are the principles of data integrity?",
"text": "The principles of data integrity are attributable, legible, contemporaneous, original, and accurate. These simple principles need to be part of a data life cycle, GDP, and data integrity initiatives."
Article | July 19, 2021
Despite the huge contributions of deep learning to the field of artificial intelligence, there’s something very wrong with it: It requires huge amounts of data. This is one thing that both the pioneers and critics of deep learning agree on. In fact, deep learning didn’t emerge as the leading AI technique until a few years ago because of the limited availability of useful data and the shortage of computing power to process that data.Reducing the data-dependency of deep learning is currently among the top priorities of AI researchers.
Article | July 19, 2021
Clear conceptualization, taxonomies, categories, criteria, properties when solving complex real-life contextualized problems is non-negotiable, a “must” to unveil the hidden potential of NPL impacting on the transparency of a model.
It is common knowledge that many authors and researchers in the field of natural language processing (NLP) and machine learning (ML) are prone to use explainability and interpretability interchangeably, which from the start constitutes a fallacy. They do not mean the same, even when looking for a definition from different perspectives.
A formal definition of what explanation, explainable, explainability mean can be traced to social science, psychology, hermeneutics, philosophy, physics and biology. In The Nature of Explanation, Craik (1967:7) states that “explanations are not purely subjective things; they win general approval or have to be withdrawn in the face of evidence or criticism.” Moreover, the power of explanation means the power of insight and anticipation and why one explanation is satisfactory involves a prior question why any explanation at all should be satisfactory or in machine learning terminology how a model is performant in different contextual situations. Besides its utilitarian value, that impulse to resolve a problem whether or not (in the end) there is a practical application and which will be verified or disapproved in the course of time, explanations should be “meaningful”.
We come across explanations every day. Perhaps the most common are reason-giving ones. Before advancing in the realm of ExNLP, it is crucial to conceptualize what constitutes an explanation. Miller (2017) considered explanations as “social interactions between the explainer and explainee”, therefore the social context has a significant impact in the actual content of an explanation. Explanations in general terms, seek to answer the why type of question. There is a need for justification. According to Bengtsson (2003) “we will accept an explanation when we feel satisfied that the explanans reaches what we already hold to be true of the explanandum”, (being the explanandum a statement that describes the phenomenon to be explained (it is a description, not the phenomenon itself) and the explanan at least two sets of statements, used for the purpose of elucidating the phenomenon).
In discourse theory (my approach), it is important to highlight that there is a correlation between understanding and explanation, first and foremost. Both are articulated although they belong to different paradigmatic fields. This dichotomous pair is perceived as a duality, which represents an irreducible form of intelligibility.
When there are observable external facts subject to empirical validation, systematicity, subordination to hypothetic procedures then we can say that we explain. An explanation is inscribed in the analytical domain, the realm of rules, laws and structures. When we explain we display propositions and meaning. But we do not explain in a vacuum. The contextual situation permeates the content of an explanation, in other words, explanation is an epistemic activity: it can only relate things described or conceptualized in a certain way. Explanations are answers to questions in the form: why fact, which most authors agree upon.
Understanding can mean a number of things in different contexts. According to Ricoeur “understanding precedes, accompanies and swathes an explanation, and an explanation analytically develops understanding.” Following this line of thought, when we understand we grasp or perceive the chain of partial senses as a whole in a single act of synthesis. Originally, belonging to the field of the so-called human science, then, understanding refers to a circular process and it is directed to the intentional unit of discourse whereas an explanation is oriented to the analytical structure of a discourse.
Now, to ground any discussion on what interpretation is, it is crucial to highlight that the concept of interpretation opposes the concept of explanation. They cannot be used interchangeably. If considered as a unit, they composed what is called une combinaison éprouvé (a contrasted dichotomy). Besides, in dissecting both definitions we will see that the agent that performs the explanation differs from the one that produce the interpretation.
At present there is a challenge of defining—and evaluating—what constitutes a quality interpretation. Linguistically speaking, “interpretation” is the complete process that encompasses understanding and explanation. It is true that there is more than one way to interprete an explanation (and then, an explanation of a prediction) but it is also true that there is a limited number of possible explanations if not a unique one since they are contextualized. And it is also true that an interpretation must not only be plausible, but more plausible than another interpretation. Of course there are certain criteria to solve this conflict. And to prove that an interpretation is more plausible based on an explanation or the knowledge could be related to the logic of validation rather than to the logic of subjective probability.
Narrowing it down
How are these concepts transferred from theory to praxis? What is the importance of the "interpretability" of an explainable model? What do we call a "good" explainable model? What constitutes a "good explanation"? These are some of the many questions that researchers from both academia and industry are still trying to answer.
In the realm on machine learning current approaches conceptualize interpretation in a rather ad-hoc manner, motivated by practical use cases and applications. Some suggest model interpretability as a remedy, but only a few are able to articulate precisely what interpretability means or why it is important. Hence more, most in the research community and industry use this term as synonym of explainability, which is certainly not. They are not overlapping terms. Needless to say, in most cases technical descriptions of interpretable models are diverse and occasionally discordant.
A model is better interpretable than another model if its decisions are easier for a human to comprehend than decisions from the other model (Molnar, 2021). For a model to be interpretable (being interpretable the quality of the model), the information conferred by an interpretation may be useful. Thus, one purpose of interpretations may be to convey useful information of any kind. In Molnar’s words the higher the interpretability of a machine learning model, the easier it is for someone to comprehend why certain decisions or predictions have been made.” I will make an observation here and add “the higher the interpretability of an explainable machine learning model”. Luo et. al. (2021) defines “interpretability as ‘the ability [of a model] to explain or to present [its predictions] in understandable terms to a human.” Notice that in this definition the author includes “understanding” as part of the definition, giving the idea of completeness. Thus, the triadic closure explanation-understanding-interpretation is fulfilled, in which the explainer and interpretant (the agents) belong to different instances and where interpretation allows the extraction and formation of additional knowledge captured by the explainable model.
Now are the models inherently interpretable? Well, it is more a matter of selecting the methods of achieving interpretability: by (a) interpreting existing models via post-hoc techniques, or (b) designing inherently interpretable models, which claim to provide more faithful interpretations than post-hoc interpretation of blackbox models. The difference also lies in the agency –like I said before– , and how in one case interpretation may affect the explanation process, that is model’s inner working or just include natural language explanations of learned representations or models.
Article | July 19, 2021
Stephen Hawking, one of the finest minds to have ever lived, once famously said, “AI is likely to be either the best or the worst thing to happen to humanity.” This is of course true, with valid arguments both for and against the proliferation of AI.
As a practitioner, I have witnessed the AI revolution at close quarters as it unfolded at breathtaking pace over the last two decades. My personal view is that there is no clear black and white in this debate. The pros and cons are very contextual – who is developing it, for what application, in what timeframe, towards what end?
It always helps to understand both sides of the debate. So let’s try to take a closer look at what the naysayers say. The most common apprehensions can be clubbed into three main categories:
A. Large-scale Unemployment: This is the most widely acknowledged of all the risks of AI. Technology and machines replacing humans for doing certain types of work isn’t new. We all know about entire professions dwindling, and even disappearing, due to technology. Industrial Revolution too had led to large scale job losses, although many believe that these were eventually compensated for by means of creating new avenues, lowering prices, increasing wages etc.
However, a growing number of economists no longer subscribe to the belief that over a longer term, technology has positive ramifications on overall employment. In fact, multiple studies have predicted large scale job losses due to technological advancements. A 2016 UN report concluded that 75% of jobs in the developing world are expected to be replaced by machines!
Unemployment, particularly at a large scale, is a very perilous thing, often resulting in widespread civil unrest. AI’s potential impact in this area therefore calls for very careful political, sociological and economic thinking, to counter it effectively.
B. Singularity: The concept of Singularity is one of those things that one would have imagined seeing only in the pages of a futuristic Sci-Fi novel. However, in theory, today it is a real possibility. In a nutshell, Singularity refers to that point in human civilization when Artificial Intelligence reaches a tipping point beyond which it evolves into a superintelligence that surpasses human cognitive powers, thereby potentially posing a threat to human existence as we know it today.
While the idea around this explosion of machine intelligence is a very pertinent and widely discussed topic, unlike the case of technology driven unemployment, the concept remains primarily theoretical. There is as yet no consensus amongst experts on whether this tipping point can ever really be reached in reality.
C. Machine Consciousness: Unlike the previous two points, which can be regarded as risks associated with the evolution of AI, the aspect of machine consciousness perhaps is best described as an ethical conundrum. The idea deals with the possibility of implanting human-like consciousness into machines, taking them beyond the realm of ‘thinking’ to that of ‘feeling, emotions and beliefs’.
It’s a complex topic and requires delving into an amalgamation of philosophy, cognitive science and neuroscience. ‘Consciousness’ itself can be interpreted in multiple ways, bringing together a plethora of attributes like self-awareness, cause-effect in mental states, memory, experiences etc. To bring machines to a state of human-like consciousness would entail replicating all the activities that happen at a neural level in a human brain – by no means a meagre task.
If and when this were to be achieved, it would require a paradigm shift in the functioning of the world. Human society, as we know it, will need a major redefinition to incorporate machines with consciousness co-existing with humans. It sounds far-fetched today, but questions such as this need pondering right now, so as to be able to influence the direction in which we move when it comes to AI and machine consciousness, while things are still in the ‘design’ phase so to speak.
While all of the above are pertinent questions, I believe they don’t necessarily outweigh the advantages of AI. Of course, there is a need to address them systematically, control the path of AI development and minimize adverse impact. In my opinion, the greatest and most imminent risk is actually a fourth item, not often taken into consideration, when discussing the pitfalls of AI.
D. Oligarchy: Or to put it differently, the question of control. Due to the very nature of AI – it requires immense investments in technology and science – there are realistically only a handful of organizations (private or government) that can make the leap into taking AI into the mainstream, in a scalable manner, and across a vast array of applications. There is going to be very little room for small upstarts, however smart they might be, to compete at scale against these.
Given the massive aspects of our lives that will likely be steered by AI enabled machines, those who control that ‘intelligence’ will hold immense power over the rest of us. That all familiar phrase ‘with great power, comes great responsibility’ will take a whole new meaning – the organizations and/or individuals that are at the forefront of the generally available AI applications would likely have more power than the most despotic autocrats in history. This is a true and real hazard, aspects of which are already becoming areas of concern in the form of discussions around things like privacy.
In conclusion, AI, like all major transformative events in human history, is certain to have wide reaching ramifications. But with careful forethought these can be addressed. In the short to medium term, the advantages of AI in enhancing our lives, will likely outweigh these risks. Any major conception that touches human lives in a broad manner, if not handled properly, can pose immense danger. The best analogy I can think of is religion – when not channelled appropriately, it probably poses a greater threat than any technological advancement ever could.