Deploy Hadoop Multiple Ways

| February 23, 2016

article image
The current Hadoop market is dominated by two players being Cloudera and Hortonworks. Both are built on top of open source Hadoop and are very similar in their packaging except with a few differences in applications (Impala, Ambari, Ranger, Sentry etc etc) from a software perspective and their support structures. Standing on the sidelines reminds me of watching a similar game played out over a decade ago in the Linux space when you had Redhat, Suse and others all competing in the same space.

Spotlight

Parakeeto

Parakeeto is an agency profitability platform which combines software and consulting to help agencies measure their most important metrics and make the right decisions to improve their scalability and profitability.

OTHER ARTICLES

Will We Be Able to Use AI to Prevent Further Pandemics?

Article | March 9, 2021

For many, 2021 has brought hope that they can cautiously start to prepare for a world after Covid. That includes living with the possibility of future pandemics, and starting to reflect on what has been learned from such a brutal shared experience. One of the areas that has come into its own during Covid has been artificial intelligence (AI), a technology that helped bring the pandemic under control, and allow life to continue through lockdowns and other disruptions. Plenty has been written about how AI has supported many aspects of life at work and home during Covid, from videoconferencing to online food ordering. But the role of AI in preventing Covid causing even more havoc is not necessarily as widely known. Perhaps even more importantly, little has been said about the role AI is likely to play in preparing for, responding to and even preventing future pandemics. From what we saw in 2020, AI will help prevent global outbreaks of new diseases in three ways: prediction, diagnosis and treatment. Prediction Predicting pandemics is all about tracking data that could be possible early signs that a new disease is spreading in a disturbing way. The kind of data we’re talking about includes public health information about symptoms presenting to hospitals and doctors around the world. There is already plenty of this captured in healthcare systems globally, and is consolidated into datasets such as the Johns Hopkins reports that many of us are familiar with from news briefings. Firms like Bluedot and Metabiota are part of a growing number of organisations which use AI to track both publicly available and private data and make relevant predictions about public health threats. Both of these received attention in 2020 by reporting the appearance of Covid before it had been officially acknowledged. Boston Children’s Hospital is an example of a healthcare institution doing something similar with their Healthmap resource. In addition to conventional healthcare data, AI is uniquely able to make use of informal data sources such as social media, news aggregators and discussion forums. This is because of AI techniques such as natural language processing and sentiment analysis. Firms such as Stratifyd use AI to do this in other business settings such as marketing, but also talk publicly about the use of their platform to predict and prevent pandemics. This is an example of so-called augmented intelligence, where AI is used to guide people to noteworthy data patterns, but stops short of deciding what it means, leaving that to human judgement. Another important part of preventing a pandemic is keeping track of the transmission of disease through populations and geographies. A significant issue in 2020 was difficulty tracing people who had come into contact with infection. There was some success using mobile phones for this, and AI was critical in generating useful knowledge from mobile phone data. The emphasis of Covid tracing apps in 2020 was keeping track of how the disease had already spread, but future developments are likely to be about predicting future spread patterns from such data. Prediction is a strength of AI, and the principles used to great effect in weather forecasting are similar to those used to model likely pandemic spread. Diagnosis To prevent future pandemics, it won’t be enough to predict when a disease is spreading rapidly. To make the most of this knowledge, it’s necessary to diagnose and treat cases. One of the greatest early challenges with Covid was the lack of speedy, reliable tests. For future pandemics, AI is likely to be used to create such tests more quickly than was the case in 2020. Creating a useful test involves modelling a disease’s response to different testing reagents, finding right balance between speed, convenience and accuracy. AI modelling simulates in a computer how individual cells respond to different stimuli, and could be used to perform virtual testing of many different types of test to accelerate how quickly the most promising ones reach laboratory and field trials. In 2020 there were also several novel uses of AI to diagnose Covid, but there were few national and global mechanisms to deploy these at scale. One example was the use of AI imaging, diagnosing Covid by analysing chest x-rays for features specific to Covid. This would have been especially valuable in places that didn’t have access to lab testing equipment. Another example was using AI to analyse the sound of coughs to identify unique characteristics of a Covid cough. AI research to systematically investigate innovative diagnosis techniques such as these should result in better planning for alternatives to laboratory testing. Faster and wider rollout of this kind of diagnosis would help control spread of a future disease during the critical period waiting for other tests to be developed or shared. This would be another contribution of AI to preventing a localised outbreak becoming a pandemic. Treatment Historically, vaccination has proven to be an effective tool for dealing with pandemics, and was the long term solution to Covid for most countries. AI was used to accelerate development of Covid vaccines, helping cut the development time from years or decades to months. In principle, the use of AI was similar to that described above for developing diagnostic tests. Different drug development teams used AI in different ways, but they all relied on mathematical modelling of how the Covid virus would respond to many forms of treatment at a microscopic level. Much of the vaccine research and modelling focused on the “spike” proteins that allow Covid to attack human cells and enter the body. These are also found in other viruses, and were already the subject of research before the 2020 pandemic. That research allowed scientists to quickly develop AI models to represent the spikes, and simulate the effects of different possible treatments. This was crucial in trialling thousands of possible treatments in computer models, pinpointing the most likely successes for further investigation. This kind of mathematical simulation using AI continued during drug development, and moved substantial amounts of work from the laboratory to the computer. This modelling also allowed the impact of Covid mutations on vaccines to be assessed quickly. It is why scientists were reasonably confident of developing variants of vaccines for new Covid mutations in days and weeks rather than months. As a result of the global effort to develop Covid vaccines, the body of data and knowledge about virus behaviour has grown substantially. This means it should be possible to understand new pathogens even more rapidly than Covid, potentially in hours or days rather than weeks. AI has also helped create new ways of approaching vaccine development, for example the use of pre-prepared generic vaccines designed to treat viruses from the same family as Covid. Modifying one of these to the specific features of a new virus is much faster than starting from scratch, and AI may even have already simulated exactly such a variation. AI has been involved in many parts of the fight against Covid, and we now have a much better idea than in 2020 of how to predict, diagnose and treat pandemics, especially similar viruses to Covid. So we can be cautiously optimistic that vaccine development for any future Covid-like viruses will be possible before it becomes a pandemic. Perhaps a trickier question is how well we will be able to respond if the next pandemic is from a virus that is nothing like Covid. Was Rahman is an expert in the ethics of artificial intelligence, the CEO of AI Prescience and the author of AI and Machine Learning. See more at www.wasrahman.com

Read More

Advanced Data and Analytics Can Add Value in Private Equity Industry!

Article | January 6, 2021

As the organizations go digital the amount of data generated whether in-house or from outside is humongous. In fact, this data keeps increasing with every tick of the clock. There is no doubt about the fact that most of this data can be junk, however, at the same time this is also the data set from where an organization can get a whole lot of insight about itself. It is a given that organizations that don’t use this generated data to build value to their organization are prone to speed up their obsolescence or might be at the edge of losing the competitive edge in the market. Interestingly it is not just the larger firms that can harness this data and analytics to improve their overall performance while achieving operational excellence. Even the small size private equity firms can also leverage this data to create value and develop competitive edge. Thus private equity firms can achieve a high return on an initial investment that is low. Private Equity industry is skeptical about using data and analytics citing the reason that it is meant for larger firms or the firms that have deep pockets, which can afford the revamping cost or can replace their technology infrastructure. While there are few private equity investment professionals who may want to use this advanced data and analytics but are not able to do so for the lack of required knowledge. US Private Equity Firms are trying to understand the importance of advanced data and analytics and are thus seeking professionals with the expertise in dealing with data and advanced analytics. For private equity firms it is imperative to comprehend that data and analytics’ ability is to select the various use cases, which will offer the huge promise for creating value. Top Private Equity firms all over the world can utilize those use cases and create quick wins, which will in turn build momentum for wider transformation of businesses. Pinpointing the right use cases needs strategic thinking by private equity investment professionals, as they work on filling the relevant gaps or even address vulnerabilities. Private Equity professionals most of the time are also found thinking operationally to recognize where can they find the available data. Top private equity firms in the US have to realize that the insights which Big data and advanced analytics offer can result in an incredible opportunity for the growth of private equity industry. As Private Equity firms realize the potential and the power of big data and analytics they will understand the invaluableness of the insights offered by big data and analytics. Private Equity firms can use the analytics insights to study any target organization including its competitive position in the market and plan their next move that may include aggressive bidding for organizations that have shown promise for growth or leaving the organization that is stuffed with loads of underlying issues. But for all these and also to build careers in private equity it is important to have reputed qualification as well. A qualified private equity investment professional will be able to devise information-backed strategies in no time at all. In addition, with Big Data and analytics in place, private equity firms can let go of numerous tasks that are done manually and let the technology do the dirty work. There have been various studies that show how big data and analytics can help a private Equity firm.

Read More

3 steps to build a data fabric to integrate all your data tools

Article | May 17, 2021

One approach for better data utilization is the data fabric, a data management approach that arranges data in a single "fabric" that spans multiple systems and endpoints. The goal of the fabric is to link all data so it can easily be accessed. "DataOps and data fabric are two different but related things," said Ed Thompson, CTO at Matillion, which provides a cloud data integration platform. "DataOps is about taking practices which are common in modern software development and applying them to data projects. Data fabric is about the type of data landscape that you create and how the tools that you use work together."

Read More

How Should Data Science Teams Deal with Operational Tasks?

Article | April 16, 2021

Introduction There are many articles explaining advanced methods on AI, Machine Learning or Reinforcement Learning. Yet, when it comes to real life, data scientists often have to deal with smaller, operational tasks, that are not necessarily at the edge of science, such as building simple SQL queries to generate lists of email addresses to target for CRM campaigns. In theory, these tasks should be assigned to someone more suited, such as Business Analysts or Data Analysts, but it is not always the case that the company has people dedicated specifically to those tasks, especially if it’s a smaller structure. In some cases, these activities might consume so much of our time that we don’t have much left for the stuff that matters, and might end up doing a less than optimal work in both. That said, how should we deal with those tasks? In one hand, not only we usually don’t like doing operational tasks, but they are also a bad use of an expensive professional. On the other hand, someone has to do them, and not everyone has the necessary SQL knowledge for it. Let’s see some ways in which you can deal with them in order to optimize your team’s time. Reduce The first and most obvious way of doing less operational tasks is by simply refusing to do them. I know it sounds harsh, and it might be impractical depending on your company and its hierarchy, but it’s worth trying it in some cases. By “refusing”, I mean questioning if that task is really necessary, and trying to find best ways of doing it. Let’s say that every month you have to prepare 3 different reports, for different areas, that contain similar information. You have managed to automate the SQL queries, but you still have to double check the results and eventually add/remove some information upon the user’s request or change something in the charts layout. In this example, you could see if all of the 3 different reports are necessary, or if you could adapt them so they become one report that you send to the 3 different users. Anyways, think of ways through which you can reduce the necessary time for those tasks or, ideally, stop performing them at all. Empower Sometimes it can pay to take the time to empower your users to perform some of those tasks themselves. If there is a specific team that demands most of the operational tasks, try encouraging them to use no-code tools, putting it in a way that they fell they will be more autonomous. You can either use already existing solutions or develop them in-house (this could be a great learning opportunity to develop your data scientists’ app-building skills). Automate If you notice it’s a task that you can’t get rid of and can’t delegate, then try to automate it as much as possible. For reports, try to migrate them to a data visualization tool such as Tableau or Google Data Studio and synchronize them with your database. If it’s related to ad hoc requests, try to make your SQL queries as flexible as possible, with variable dates and names, so that you don’t have to re-write them every time. Organize Especially when you are a manager, you have to prioritize, so you and your team don’t get drowned in the endless operational tasks. In order to do this, set aside one or two days in your week which you will assign to that kind of work, and don’t look at it in the remaining 3–4 days. To achieve this, you will have to adapt your workload by following the previous steps and also manage expectations by taking this smaller amount of work hours when setting deadlines. This also means explaining the paradigm shift to your internal clients, so they can adapt to these new deadlines. This step might require some internal politics, negotiating with your superiors and with other departments. Conclusion Once you have mapped all your operational activities, you start by eliminating as much as possible from your pipeline, first by getting rid of unnecessary activities for good, then by delegating them to the teams that request them. Then, whatever is left for you to do, you automate and organize, to make sure you are making time for the relevant work your team has to do. This way you make sure expensive employees’ time is being well spent, maximizing company’s profit.

Read More

Spotlight

Parakeeto

Parakeeto is an agency profitability platform which combines software and consulting to help agencies measure their most important metrics and make the right decisions to improve their scalability and profitability.

Events