Q&A: Data Dictionaries and the Big Data Lifeline

DANA LIBERTY | March 4, 2019

article image
Data Dictionaries. Sounds like a blast from the past, right? Wrong. This simple, long-standing tool is even more relevant to us today than it ever was in the past. Working with data people every day, I know what it takes for our analysts and data engineers to go through the data and make it easier to analyze (it’s a known fact that 80% of their time is spent preparing and managing data for analysis). And as more data is collected and stored, data dictionaries will become a much-needed lifeline in this ever-growing sea of data.

Spotlight

Divergence Academy

Divergence.Academy is creating adaptive learning solutions to empower individuals to pursue the work they love on the most relevant skills of the 21st century. Approved by the Texas Workforce Commission we are a NEW style trade school supporting the growth of the "Silicon" economy in the state of Texas. We offer a 12-week Immersive Program that provides 600-hours of pragmatic hands-on learning experience in the fields of Data Science, Data Engineering and Internet of Things. And, we have part-time programs both classroom and remote that offer flexibility for the working data professional.

OTHER ARTICLES

Predictive analytics vs AI Why the difference matters

Article | February 10, 2020

There are few movie scenes I can recall from my childhood, but I vividly remember seeing the 1968 Stanley Kubrick sci-fi movie 2001 A Space Odyssey in 1970 with my older cousin. What stays with me to this day is the scene where astronaut Dave asks HAL, the homicidal computer based on artificial intelligence (AI), to open the pod bay doors. HAL's eerie reply: I'm sorry, Dave. I'm afraid I can't do that.In that moment, the concept of man vs. machine was created, predicated on the idea that machines created by man and using AI could (eventually) defy orders, position themselves in the vanguard, and overthrow humankind. Fast forward to today. Within the information governance space, there are two terms that have been used quite frequently in recent years analytics and AI. Often they are used interchangeably and are practically synonymous.

Read More

3 steps to build a data fabric to integrate all your data tools

Article | May 17, 2021

One approach for better data utilization is the data fabric, a data management approach that arranges data in a single "fabric" that spans multiple systems and endpoints. The goal of the fabric is to link all data so it can easily be accessed. "DataOps and data fabric are two different but related things," said Ed Thompson, CTO at Matillion, which provides a cloud data integration platform. "DataOps is about taking practices which are common in modern software development and applying them to data projects. Data fabric is about the type of data landscape that you create and how the tools that you use work together."

Read More

7 Data Storage Trends You Cannot Miss in a Data Center

Article | July 23, 2020

Contents: 1 Introduction 2 Top Data Storage Trends That Simplify Data Management 2.1 AI Storage Continues to be The Chief 2.2 Price Markdown in Flash Storage 2.3 Hybrid Multi Cloud for The Win 2.4 Increased Significance of Software-Defined Storage 2.5 Non-Volatile Memory Express (NVMe) Beats Data Center Fabrics 2.6 Acceleration of Storage Class Memory 2.7 Hyperconverged Storage – A Push to Edge Computing 3 The Future of Data Storage 1. Introduction There’s more to data than just to store it. Organizations not only have the responsibility of dealing with a plethora of data, but are also anticipated of safeguarding it. One of the primary alternatives that enterprises are indulging in to keep up with the continuous data expansion is data storage entities and applications. A recent study conducted by Statista revealed that worldwide spending on data storage units is expected to exceed 78 billion U.S. dollars by 2021. Going by these storage stats, it can be certainly said that data is going to be amplified at a much faster rate, and companies do not have a choice but to be geared up for a data boom and still be relevant. When it comes to data management/storage, information technology has risen to all its glory with concepts like machine learning. While the idea of such profound approaches is thrilling, the real question boils down to whether organizations are ready as well as equipped enough to handle them. The answer to this might be NO. But, can companies make changes and still thrive? Most definitely, YES! To make this concept more understandable, here is a list of changes/trends that companies should adopt to make data storage a lot more easy and secure. 2. Top data storage trends that simplify data management Data corruption is one big issue that most companies face. The complications that unfold further because of the corruption of data are even more complicated to resolve. To fix this and other such data storage problems, companies have come up with trends that are resilient and flexible. These trends have the capability of making history in the world of technology, so, you better gear up to learn and later adapt to them. 2.1 AI storage continues to be the chief The speed with which AI hit the IT world just doesn’t seem to slow down even after all these years. We say this because, amongst all other concepts that were and are constantly being introduced, artificial intelligence is one applied science that has made the most amount of innovations. To further add to this, AI is now making enterprise data storage process easier with its various subsets like machine learning and deep learning. This technology is helping companies in accumulating multiple layers of data in a more assorted format. It is automating IT storages including data migrating, archiving, protecting, etc. With AI, companies will be able to control data storage across multiple locations and storage platforms. 2.2 Price markdown in Flash storage As per a report by Markets and Markets, the overall All-Flash Array Market was valued at USD 5.9 billion in 2018 and is expected to reach USD 17.8 billion by 2023, at a CAGR of 24.53% during this period. This growth only states that the need for all-flash storage is only going to broaden. Flash storage has always been a choice that most companies stayed away from mainly because of the price. But with this new trend of adopting flexible data storage ways coming in, flash storage has been offered at a much-depreciated price. The drop in the cost of this storage technology will finally enable businesses of all sizes to invest in this high-performance solution. READ MORE: HOW BUSINESS ANALYTICS ACCELERATES YOUR BUSINESS GROWTH 2.3 Hybrid multi cloud for the win With data growing every minute, just a “cloud” strategy will not be enough. In this wave of data storage services, hybrid multi-cloud is one concept that is helping manage off-premises data. With this growing concept, IT authorities will be able to collect, segregate and store, on-premises, and off-premises data in a much-sophisticated manner. This will enable in centrally managing while reducing the effort of data storage by automating policy-based data placement across a hybrid of multi-cloud and storage types. 2.4 Increased significance of software-defined storage More the data, less reliability on hardware devices – this is the growing attitude of most companies. This fear certainly has the possibility of becoming a reality. Hence, an addition to the cybersecurity strategy that companies can make is adopting software-defined storage. This approach of data storage disconnects the underlying physical storage hardware. It is programmed in a way that can function on policy-based management of resources, automated provision, and computerized storage capacity reassignment. Due to the automated function, scaling up and down of data is also faster. Some of the biggest advantages of this trend will be the governance, data protection, and security it will provide to the entire loop. 2.5 Non-Volatile Memory Express (NVMe) beats data center fabrics NVMe – as ornate as the name sounds, is a concept that is freshly introduced with the aim of making data storage simpler. Non-Volatile Memory Express is a concept that enables accessibility of high-speed storage media. It is a protocol that is showing great results in a short amount of time of its inception. NVMe not only increases the performance value of existing applications, but also enables new applications to real-time workload processing. This feature of high performance and low latency is surely a highlight of the concept. All in all, this entire trend seems to have a lot of potential that are yet to be explored. READ MORE: HOW TO MAXIMIZE VALUE FROM DATA COLLECTED FOR BUSINESSES SUCCESS 2.6 Acceleration of storage class memory Storage class memory is a perfect combination of flash storage and NVMe. This is because it perfectly fills in the gap between server storage and external storage. As data protection is one of the major concerns of enterprises, this upcoming trend, does not only protect data but also continually stores and improves it for easier segregation. A clear advantage that storage class memory has over flash and NVMe storages is that it provides memory-like byte-addressable access to data thus reducing piling up of irrelevant data. Another benefit of this trend is that it indulges in deeper integration of data for ensuring high performance and top-level data security. 2.7 Hyperconverged storage – a push to edge computing The increased demand for hyper converged storage is a result of the growth of hybrid cloud and software-defined infrastructure. Besides these technologies, its suitability for retail settings and remote offices is add on to its already existing set of features. It’s the capability of capturing data from a distance also enables cost-effectiveness and scales down the need to store everything on a public cloud. Hyper converged storage if used in its true essence can simplify IT operations and data storage for enterprises of all sizes. 3. The future of data storage According to the Internet World Stats, more than 4.5 billion internet users around the world relentlessly create an astronomical amount of data. This translates to propel companies into discovering methods or applications that help them store this data safe from harmful ransomware attacks and still use it productively for their advantage. One of the prime changes that can be estimated about the future of data storage is that companies will have to adapt to the rapid changes, and mould their process to enable quick and seamless storage of data. Another enhancement would be that IT managers and responsible authorities would have to be updated and proactive at all times to know what data storage has been newly introduced, and how it can be used for the company’s advantage. Here’s a thing, amongst all the research that enterprises are conducting, not all data storage technologies will end up becoming a hit, and will fulfil the specification of high-speed storage. But, looking at all the efforts that researchers are taking, we don’t think they are going to stop any sooner and neither is the augmentation of data!

Read More

How can machine learning detect money laundering?

Article | December 16, 2020

In this article, we will explore different techniques to detect money laundering activities. Notwithstanding, regardless of various expected applications inside the financial services sector, explicitly inside the Anti-Money Laundering (AML) appropriation of Artificial Intelligence and Machine Learning (ML) has been generally moderate. What is Money Laundering, Anti Money Laundering? Money Laundering is where someone unlawfully obtains money and moves it to cover up their crimes. Anti-Money Laundering can be characterized as an activity that forestalls or aims to forestall money laundering from occurring. It is assessed by UNO that, money-laundering exchanges account in one year is 2–5% of worldwide GDP or $800 billion — $3 trillion in USD. In 2019, regulators and governmental offices exacted fines of more than $8.14 billion. Indeed, even with these stunning numbers, gauges are that just about 1 % of unlawful worldwide money related streams are ever seized by the specialists. AML activities in banks expend an over the top measure of manpower, assets, and cash flow to deal with the process and comply with the guidelines. What are the punishments for money laundering? In 2019, Celent evaluated that spending came to $8.3 billion and $23.4 billion for technology and operations, individually. This speculation is designated toward guaranteeing anti-money laundering. As we have seen much of the time, reputational costs can likewise convey a hefty price. In 2012, HSBC laundering of an expected £5.57 billion over at least seven years.   What is the current situation of the banks applying ML to stop money laundering? Given the plenty of new instruments the banks have accessible, the potential feature risk, the measure of capital involved, and the gigantic expenses as a form of fines and punishments, this should not be the situation. A solid impact by nations to curb illicit cash movement has brought about a huge yet amazingly little part of money laundering being recognized — a triumph rate of about 2% average. Dutch banks — ABN Amro, Rabobank, ING, Triodos Bank, and Volksbank announced in September 2019 to work toward a joint transaction monitoring to stand-up fight against Money Laundering. A typical challenge in transaction monitoring, for instance, is the generation of a countless number of alerts, which thusly requires operation teams to triage and process the alarms. ML models can identify and perceive dubious conduct and besides they can classify alerts into different classes such as critical, high, medium, or low risk. Critical or High alerts may be directed to senior experts on a high need to quickly explore the issue. Today is the immense number of false positives, gauges show that the normal, of false positives being produced, is the range of 95 and 99%, and this puts extraordinary weight on banks. The examination of false positives is tedious and costs money. An ongoing report found that banks were spending near 3.01€ billion every year exploring false positives. Establishments are looking for increasing productive ways to deal with crime and, in this specific situation, Machine Learning can end up being a significant tool. Financial activities become productive, the gigantic sum and speed of money related exchanges require a viable monitoring framework that can process exchanges rapidly, ideally in real-time.   What are the types of machine learning algorithms which can identify money laundering transactions? Supervised Machine Learning, it is essential to have historical information with events precisely assigned and input variables appropriately captured. If biases or errors are left in the data without being dealt with, they will get passed on to the model, bringing about erroneous models. It is smarter to utilize Unsupervised Machine Learning to have historical data with events accurately assigned. It sees an obscure pattern and results. It recognizes suspicious activity without earlier information of exactly what a money-laundering scheme resembles. What are the different techniques to detect money laundering? K-means Sequence Miner algorithm: Entering banking transactions, at that point running frequent pattern mining algorithms and mining transactions to distinguish money laundering. Clustering transactions and dubious activities to money laundering lastly show them on a chart. Time Series Euclidean distance: Presenting a sequence matching algorithm to distinguish money laundering detection, utilizing sequential detection of suspicious transactions. This method exploits the two references to recognize dubious transactions: a history of every individual’s account and exchange data with different accounts. Bayesian networks: It makes a model of the user’s previous activities, and this model will be a measure of future customer activities. In the event that the exchange or user financial transactions have. Cluster-based local outlier factor algorithm: The money laundering detection utilizing clustering techniques combination and Outliers.   Conclusion For banks, now is the ideal opportunity to deploy ML models into their ecosystem. Despite this opportunity, increased knowledge and the number of ML implementations prompted a discussion about the feasibility of these solutions and the degree to which ML should be trusted and potentially replace human analysis and decision-making. In order to further exploit and achieve ML promise, banks need to continue to expand on its awareness of ML strengths, risks, and limitations and, most critically, to create an ethical system by which the production and use of ML can be controlled and the feasibility and effect of these emerging models proven and eventually trusted.

Read More

Spotlight

Divergence Academy

Divergence.Academy is creating adaptive learning solutions to empower individuals to pursue the work they love on the most relevant skills of the 21st century. Approved by the Texas Workforce Commission we are a NEW style trade school supporting the growth of the "Silicon" economy in the state of Texas. We offer a 12-week Immersive Program that provides 600-hours of pragmatic hands-on learning experience in the fields of Data Science, Data Engineering and Internet of Things. And, we have part-time programs both classroom and remote that offer flexibility for the working data professional.

Events