Business Intelligence on Hadoop with AtScale and BlueData

| May 11, 2017

article image
The BlueData EPIC software platform integrates with AtScale to accelerate the deployment of Business Intelligence (BI) on Hadoop -- running on virtualized on-premises infrastructure, with seamless self-service deployment of virtual Hadoop clusters and the AtScale application within minutes.
SHOW MORE

Spotlight

Data Science Portugal

Data Science Portugal (DSPT) is an informal community of data science enthusiasts created with the purpose of sharing knowledge and experience in the fields of data science, machine learning, artificial intelligence, big data and all related fields. Since September 2016, the Data Science Portugal community is building synergies among data passionate people and bridging many cross domain fields and addressing several data science problems.

OTHER ARTICLES

Forward-thinking Business And The Implications Of Big Data

Article | March 23, 2020

Big data is a modern phenomenon transforming businesses of today. Organisations hold vast swathes of data, from historic and current orders to detailed insights about supply chain operations. This information, combined with external data such as market intelligence and even weather patterns, can provide businesses with a foundation on which to base their planning and decision-making. Business intelligence and analytical solutions pull valuable insights from huge datasets. From workforce optimisation to cost management, access to big data and the tools that manage and evaluate it allows firms to streamline key parts of their business. Adopters of modern solutions are seeing vast improvements in all areas of the company.

Read More

Time Machine Big Data of the Past for the Future of Europe

Article | February 24, 2020

Emerging technology has the power to transform history and cultural heritage into a living resource. The Time Machine project will digitise archives from museums and libraries, using Artificial Intelligence and Big Data mining, to offer richer interpretations of our past. An inclusive European identity benefits from a deep engagement with the region’s past. The Time Machine project set out to offer this by exploiting already freely accessible Big Data sources. EU support for a preparatory action enabled the development of a decade-long roadmap for the large-scale digitisation of kilometres of archives, from large museum and library collections, into a distributed information system. Artificial Intelligence (AI) will play a key role at each step, from digitisation planning to document interpretation and fact-checking. Once embedded, this infrastructure could create new business and employment opportunities across a range of sectors including ICT, the creative industries and tourism.

Read More

AI and Predictive Analytics: Myth, Math, or Magic

Article | February 10, 2020

We are a species invested in predicting the future as if our lives depended on it. Indeed, good predictions of where wolves might lurk were once a matter of survival. Even as civilization made us physically safer, prediction has remained a mainstay of culture, from the haruspices of ancient Rome inspecting animal entrails to business analysts dissecting a wealth of transactions to foretell future sales. With these caveats in mind, I predict that in 2020 (and the decade ahead) we will struggle if we unquestioningly adopt artificial intelligence (AI) in predictive analytics, founded on an unjustified overconfidence in the almost mythical power of AI's mathematical foundations. This is another form of the disease of technochauvinism I discussed in a previous article.

Read More

Evolution of capabilities of Data Platforms & data ecosystem

Article | October 27, 2020

Data Platforms and frameworks have been constantly evolving. At some point of time; we are excited by Hadoop (well for almost 10 years); followed by Snowflake or as I say Snowflake Blizzard (who managed to launch biggest IPO win historically) and the Google (Google solves problems and serves use cases in a way that few companies can match). The end of the data warehouse Once upon a time, life was simple; or at least, the basic approach to Business Intelligence was fairly easy to describe… A process of collecting information from systems, building a repository of consistent data, and bolting on one or more reporting and visualisation tools which presented information to users. Data used to be managed in expensive, slow, inaccessible SQL data warehouses. SQL systems were notorious for their lack of scalability. Their demise is coming from a few technological advances. One of these is the ubiquitous, and growing, Hadoop. On April 1, 2006, Apache Hadoop was unleashed upon Silicon Valley. Inspired by Google, Hadoop’s primary purpose was to improve the flexibility and scalability of data processing by splitting the process into smaller functions that run on commodity hardware. Hadoop’s intent was to replace enterprise data warehouses based on SQL. Unfortunately, a technology used by Google may not be the best solution for everyone else. It’s not that others are incompetent: Google solves problems and serves use cases in a way that few companies can match. Google has been running massive-scale applications such as its eponymous search engine, YouTube and the Ads platform. The technologies and infrastructure that make the geographically distributed offerings perform at scale are what make various components of Google Cloud Platform enterprise ready and well-featured. Google has shown leadership in developing innovations that have been made available to the open-source community and are being used extensively by other public cloud vendors and Gartner clients. Examples of these include the Kubernetes container management framework, TensorFlow machine learning platform and the Apache Beam data processing programming model. GCP also uses open-source offerings in its cloud while treating third-party data and analytics providers as first-class citizens on its cloud and providing unified billing for its customers. The examples of the latter include DataStax, Redis Labs, InfluxData, MongoDB, Elastic, Neo4j and Confluent. Silicon Valley tried to make Hadoop work. The technology was extremely complicated and nearly impossible to use efficiently. Hadoop’s lack of speed was compounded by its focus on unstructured data — you had to be a “flip-flop wearing” data scientist to truly make use of it. Unstructured datasets are very difficult to query and analyze without deep knowledge of computer science. At one point, Gartner estimated that 70% of Hadoop deployments would not achieve the goal of cost savings and revenue growth, mainly due to insufficient skills and technical integration difficulties. And seventy percent seems like an understatement. Data storage through the years: from GFS to Snowflake or Snowflake blizzard Developing in parallel with Hadoop’s journey was that of Marcin Zukowski — co-founder and CEO of Vectorwise. Marcin took the data warehouse in another direction, to the world of advanced vector processing. Despite being almost unheard of among the general public, Snowflake was actually founded back in 2012. Firstly, Snowflake is not a consumer tech firm like Netflix or Uber. It's business-to-business only, which may explain its high valuation – enterprise companies are often seen as a more "stable" investment. In short, Snowflake helps businesses manage data that's stored on the cloud. The firm's motto is "mobilising the world's data", because it allows big companies to make better use of their vast data stores. Marcin and his teammates rethought the data warehouse by leveraging the elasticity of the public cloud in an unexpected way: separating storage and compute. Their message was this: don’t pay for a data warehouse you don’t need. Only pay for the storage you need, and add capacity as you go. This is considered one of Snowflake’s key innovations: separating storage (where the data is held) from computing (the act of querying). By offering this service before Google, Amazon, and Microsoft had equivalent products of their own, Snowflake was able to attract customers, and build market share in the data warehousing space. Naming the company after a discredited database concept was very brave. For those of us not in the details of the Snowflake schema, it is a logical arrangement of tables in a multidimensional database such that the entity-relationship diagram resembles a snowflake shape. … When it is completely normalized along all the dimension tables, the resultant structure resembles a snowflake with the fact table in the middle. Needless to say, the “snowflake” schema is as far from Hadoop’s design philosophy as technically possible. While Silicon Valley was headed toward a dead end, Snowflake captured an entire cloud data market.

Read More

Spotlight

Data Science Portugal

Data Science Portugal (DSPT) is an informal community of data science enthusiasts created with the purpose of sharing knowledge and experience in the fields of data science, machine learning, artificial intelligence, big data and all related fields. Since September 2016, the Data Science Portugal community is building synergies among data passionate people and bridging many cross domain fields and addressing several data science problems.

Events