Article | May 17, 2021
One approach for better data utilization is the data fabric, a data management approach that arranges data in a single "fabric" that spans multiple systems and endpoints. The goal of the fabric is to link all data so it can easily be accessed.
"DataOps and data fabric are two different but related things," said Ed Thompson, CTO at Matillion, which provides a cloud data integration platform. "DataOps is about taking practices which are common in modern software development and applying them to data projects. Data fabric is about the type of data landscape that you create and how the tools that you use work together."
Article | June 10, 2021
We discursive creatures are construed within a meaningful, bounded communicative environment, namely context(s) and not in a vacuum.
Context(s) co-occur in different scenarios, that is, in mundane talk as well as in academic discourse where the goal of natural language communication is mutual intelligibility, hence the negotiation of meaning. Discursive research focuses on the context-sensitive use of the linguistic code and its social practice in particular settings, such as medical talk, courtroom interactions, financial/economic and political discourse which may restrict its validity when ascribing to a theoretical framework and its propositions regarding its application. This is also reflected in the case of artificial intelligence approaches to context(s) such as the development of context-sensitive parsers, context-sensitive translation machines and context-sensitive information systems where the validity of an argument and its propositions is at stake.
Context is at the heart of pragmatics or even better said context is the anchor of any pragmatic theory: sociopragmatics, discourse analysis and ethnomethodological conversation analysis. Academic disciplines, such as linguistics, philosophy, anthropology, psychology and literary theory have also studied various aspects of the context phenomena. Yet, the concept of context has remained fuzzy or is generally undefined. It seems that the denotation of the word [context] has become murkier as its uses have been extended in many directions.
Context or/ and contexts? Now in order to be “felicitous” integrated into the pragmatic construct, the definition of context needs some delimitations. Depending on the frame of research, context is delimitated to the global surroundings of the phenomenon to be investigated, for instance if its surrounding is of extra-linguistic nature it is called the socio-cultural context, if it comprises features of a speech situation, it is called the linguistic context and if it refers to the cognitive material, that is a mental representation, it is called the cognitive context. Context is a transcendental notion which plays a key role in interpretation.
Language is no longer considered as decontextualized sentences. Instead language is seen as embedded in larger activities, through which they become meaningful. In a dynamic outlook on communication, the acts of speaking (which generates a form discourse, for instance, conversational discourse, lecture or speech) and interpreting build contexts and at the same time constrain the building of such contexts. In Heritage’s terminology, “the production of talk is doubly contextual” (Heritage 1984: 242). An utterance relies upon the existing context for its production and interpretation, and it is, in its own right, an event that shapes a new context for the action that will follow. A linguistic context can be decontextualized at a local level, and it can be recontextualized at a global level. There is intra-discursive recontextualization anchored to local decontextualization, and there is interdiscursive recontextualization anchored to global recontextualization. “A given context not only 'legislates' the interpretation of indexical elements; indexical elements can also mold the background of the context” (Ochs, 1990). In the case of recontextualization, in a particular scenario, it is valid to ask what do you mean or how do you mean. Making a reference to context and a reference to meaning helps to clarify when there is a controversy about the communicative status and at the same time provides a frame for the recontextualization.
A linguistic context is intrinsically linked to a social context and a subcategory of the latter, the socio-cultural context. The social context can be considered as unmarked, hence a default context, whereas a socio-cultural context can be conceived as a marked type of context in which specific variables are interpreted in a particular mode. Culture provides us, the participants, with a filter mechanism which allows us to interpret a social context in accordance with particular socio-cultural context constraints and requirements. Besides, socially constitutive qualities of context are unavoidable since each interaction updates the existing context and prepares new ground for subsequent interaction.
Now, how these aforementioned conceptualizations and views are reflected in NLP? Most of the research work has focused in the linguistic context, that is, in the word level surroundings and the lexical meaning. An approach to producing sense embeddings for the lexical meanings within a lexical knowledge base which lie in a space that is comparable to that of contextualized word vectors.
Contextualized word embeddings have been used effectively across several tasks in Natural Language Processing, as they have proved to carry useful semantic information. The task of associating a word in context with the most suitable meaning from a predefined sense inventory is better known as Word Sense Disambiguation (Navigli, 2009). Linguistically speaking, “context encompasses the total linguistic and non-linguistic background of a text” (Crystal, 1991). Notice that the nature of context(s) is clearly crucial when reconstructing the meaning of a text. Therefore, “meaning-in-context should be regarded as a probabilistic weighting, of the list of potential meanings available to the user of the language.” The so-called disambiguating role of context should be taken with a pinch of salt.
The main reason for language models such as BERT (Devlin et al., 2019), RoBERTA (Liu et al., 2019) and SBERT (Reimers, 2019) proved to be beneficial in most NLP task is that contextualized embeddings of words encode the semantics defined by their input context. In the same vein, a novel method for contextualized sense representations has recently been employed: SensEmBERT (Scarlini et al., 2020) which computes sense representations that can be applied directly to disambiguation.
Still, there is a long way to go regarding context(s) research. The linguistic context is just one of the necessary conditions for sentence embeddedness in “a” context. For interpretation to take place, well-formed sentences and well-formed constructions, that is, linguistic strings which must be grammatical but may be constrained by cognitive sentence-processability and pragmatic relevance, particular linguistic-context and social-context configurations, which make their production and interpretation meaningful, will be needed.
Article | May 19, 2021
There are some fundamental differences between Business Analytics and Data Analytics, though both hold their own importance. For example, to discover patterns and observations that are ultimately used to make informed organizational decisions, Data Analytics includes analyzing datasets. On the other hand, to make realistic, data-driven business decisions, Business Analytics focuses on evaluating different kinds of information and making improvements based on those decisions. In this blog, we discuss in more detail their individual benefits and areas of expertise. Data Analytics vs. Business Analytics attracts a lot of interest from budding analysts; we will take multiple factors into account and help explain the difference between data analyst and business analyst.
Article | October 27, 2020
Data Platforms and frameworks have been constantly evolving. At some point of time; we are excited by Hadoop (well for almost 10 years); followed by Snowflake or as I say Snowflake Blizzard (who managed to launch biggest IPO win historically) and the Google (Google solves problems and serves use cases in a way that few companies can match).
The end of the data warehouse
Once upon a time, life was simple; or at least, the basic approach to Business Intelligence was fairly easy to describe… A process of collecting information from systems, building a repository of consistent data, and bolting on one or more reporting and visualisation tools which presented information to users. Data used to be managed in expensive, slow, inaccessible SQL data warehouses. SQL systems were notorious for their lack of scalability. Their demise is coming from a few technological advances. One of these is the ubiquitous, and growing, Hadoop.
On April 1, 2006, Apache Hadoop was unleashed upon Silicon Valley. Inspired by Google, Hadoop’s primary purpose was to improve the flexibility and scalability of data processing by splitting the process into smaller functions that run on commodity hardware.
Hadoop’s intent was to replace enterprise data warehouses based on SQL. Unfortunately, a technology used by Google may not be the best solution for everyone else. It’s not that others are incompetent: Google solves problems and serves use cases in a way that few companies can match. Google has been running massive-scale applications such as its eponymous search engine, YouTube and the Ads platform. The technologies and infrastructure that make the geographically distributed offerings perform at scale are what make various components of Google Cloud Platform enterprise ready and well-featured. Google has shown leadership in developing innovations that have been made available to the open-source community and are being used extensively by other public cloud vendors and Gartner clients. Examples of these include the Kubernetes container management framework, TensorFlow machine learning platform and the Apache Beam data processing programming model. GCP also uses open-source offerings in its cloud while treating third-party data and analytics providers as first-class citizens on its cloud and providing unified billing for its customers. The examples of the latter include DataStax, Redis Labs, InfluxData, MongoDB, Elastic, Neo4j and Confluent.
Silicon Valley tried to make Hadoop work. The technology was extremely complicated and nearly impossible to use efficiently. Hadoop’s lack of speed was compounded by its focus on unstructured data — you had to be a “flip-flop wearing” data scientist to truly make use of it.
Unstructured datasets are very difficult to query and analyze without deep knowledge of computer science. At one point, Gartner estimated that 70% of Hadoop deployments would not achieve the goal of cost savings and revenue growth, mainly due to insufficient skills and technical integration difficulties. And seventy percent seems like an understatement.
Data storage through the years: from GFS to Snowflake or Snowflake blizzard
Developing in parallel with Hadoop’s journey was that of Marcin Zukowski — co-founder and CEO of Vectorwise. Marcin took the data warehouse in another direction, to the world of advanced vector processing. Despite being almost unheard of among the general public, Snowflake was actually founded back in 2012. Firstly, Snowflake is not a consumer tech firm like Netflix or Uber. It's business-to-business only, which may explain its high valuation – enterprise companies are often seen as a more "stable" investment. In short, Snowflake helps businesses manage data that's stored on the cloud. The firm's motto is "mobilising the world's data", because it allows big companies to make better use of their vast data stores.
Marcin and his teammates rethought the data warehouse by leveraging the elasticity of the public cloud in an unexpected way: separating storage and compute. Their message was this: don’t pay for a data warehouse you don’t need. Only pay for the storage you need, and add capacity as you go. This is considered one of Snowflake’s key innovations: separating storage (where the data is held) from computing (the act of querying). By offering this service before Google, Amazon, and Microsoft had equivalent products of their own, Snowflake was able to attract customers, and build market share in the data warehousing space.
Naming the company after a discredited database concept was very brave. For those of us not in the details of the Snowflake schema, it is a logical arrangement of tables in a multidimensional database such that the entity-relationship diagram resembles a snowflake shape. … When it is completely normalized along all the dimension tables, the resultant structure resembles a snowflake with the fact table in the middle. Needless to say, the “snowflake” schema is as far from Hadoop’s design philosophy as technically possible.
While Silicon Valley was headed toward a dead end, Snowflake captured an entire cloud data market.