Create Apache Spark applications with new drag and drop UI

| September 9, 2019

article image
StreamSets, Inc.provider of a DataOps platform for modern data integration, has released StreamSets Transformer, a simple-to-use, drag-and-drop UI tool to create native Apache Spark applications. Designed for a wide range of users, even those without specialized skills, StreamSets Transformer enables the creation of pipelines for performing ETL, stream processing and machine-learning operations. Now, data engineers, scientists, architects, and operators gain deep visibility into the execution of Apache Spark while broadening usage across the business.

Spotlight

Databeacon Inc

Cognos Incorporated (Cognos) operates provides business intelligence (BI) and performance management (PM) software solutions. The Company’s solutions help organizations plan, understand, and manage financial and operational performance. Cognos’ integrated solutions consist of its BI components, PM solutions and analytical applications. These components are supported by software services for administration, deployment, integration and extraction, transformation, and loading (ETL)…

OTHER ARTICLES

The case for hybrid artificial intelligence

Article | March 4, 2020

Deep learning, the main innovation that has renewed interest in artificial intelligence in the past years, has helped solve many critical problems in computer vision, natural language processing, and speech recognition. However, as the deep learning matures and moves from hype peak to its trough of disillusionment, it is becoming clear that it is missing some fundamental components.

Read More

Deep Dive Digital-First Banks Harness The Power Of Data Analytics

Article | March 4, 2020

Data analytics has many purposes in the banking industry, ranging from improving cybersecurity to reducing customer churn. Every interaction from ATM withdrawals to loan applications — provides FIs with valuable data about customers’ financial lifestyles. Banks can even harness external regulatory, trading and social media engagement data, all of which can be processed and analyzed to benefit their operations.Financial data is useful in helping banks develop wide-reaching marketing campaigns, but social data is critical to developing offers for specific customers. Santa Rosa, California-based Redwood Credit Union, for example, found that social data was particularly important when offering auto loans. It initially extended preapproval for such loans every two years based solely on members’ credit scores and vehicle purchase histories, but it soon discovered that there was a much more reliable indicator and updated its preapproval frequency accordingly.

Read More
BIG DATA MANAGEMENT

How Should Data Science Teams Deal with Operational Tasks?

Article | March 4, 2020

Introduction There are many articles explaining advanced methods on AI, Machine Learning or Reinforcement Learning. Yet, when it comes to real life, data scientists often have to deal with smaller, operational tasks, that are not necessarily at the edge of science, such as building simple SQL queries to generate lists of email addresses to target for CRM campaigns. In theory, these tasks should be assigned to someone more suited, such as Business Analysts or Data Analysts, but it is not always the case that the company has people dedicated specifically to those tasks, especially if it’s a smaller structure. In some cases, these activities might consume so much of our time that we don’t have much left for the stuff that matters, and might end up doing a less than optimal work in both. That said, how should we deal with those tasks? In one hand, not only we usually don’t like doing operational tasks, but they are also a bad use of an expensive professional. On the other hand, someone has to do them, and not everyone has the necessary SQL knowledge for it. Let’s see some ways in which you can deal with them in order to optimize your team’s time. Reduce The first and most obvious way of doing less operational tasks is by simply refusing to do them. I know it sounds harsh, and it might be impractical depending on your company and its hierarchy, but it’s worth trying it in some cases. By “refusing”, I mean questioning if that task is really necessary, and trying to find best ways of doing it. Let’s say that every month you have to prepare 3 different reports, for different areas, that contain similar information. You have managed to automate the SQL queries, but you still have to double check the results and eventually add/remove some information upon the user’s request or change something in the charts layout. In this example, you could see if all of the 3 different reports are necessary, or if you could adapt them so they become one report that you send to the 3 different users. Anyways, think of ways through which you can reduce the necessary time for those tasks or, ideally, stop performing them at all. Empower Sometimes it can pay to take the time to empower your users to perform some of those tasks themselves. If there is a specific team that demands most of the operational tasks, try encouraging them to use no-code tools, putting it in a way that they fell they will be more autonomous. You can either use already existing solutions or develop them in-house (this could be a great learning opportunity to develop your data scientists’ app-building skills). Automate If you notice it’s a task that you can’t get rid of and can’t delegate, then try to automate it as much as possible. For reports, try to migrate them to a data visualization tool such as Tableau or Google Data Studio and synchronize them with your database. If it’s related to ad hoc requests, try to make your SQL queries as flexible as possible, with variable dates and names, so that you don’t have to re-write them every time. Organize Especially when you are a manager, you have to prioritize, so you and your team don’t get drowned in the endless operational tasks. In order to do this, set aside one or two days in your week which you will assign to that kind of work, and don’t look at it in the remaining 3–4 days. To achieve this, you will have to adapt your workload by following the previous steps and also manage expectations by taking this smaller amount of work hours when setting deadlines. This also means explaining the paradigm shift to your internal clients, so they can adapt to these new deadlines. This step might require some internal politics, negotiating with your superiors and with other departments. Conclusion Once you have mapped all your operational activities, you start by eliminating as much as possible from your pipeline, first by getting rid of unnecessary activities for good, then by delegating them to the teams that request them. Then, whatever is left for you to do, you automate and organize, to make sure you are making time for the relevant work your team has to do. This way you make sure expensive employees’ time is being well spent, maximizing company’s profit.

Read More

A Tale of Two Data-Centric Services

Article | March 4, 2020

The acronym DMaaS can refer to two related but separate things: data center management-as-a-service referred to here by its other acronym, DCMaaS and data management-as-a-service. The former looks at infrastructure-level questions such as optimization of data flows in a cloud service, the latter refers to master data management and data preparation as applied to federated cloud services.DCMaaS has been under development for some years; DMaaS is slightly younger and is a product of the growing interest in machine learning and big data analytics, along with increasing concern over privacy, security, and compliance in a cloud environment.DMaaS responds to a developing concern over data quality in machine learning due to the large amount of data that must be used for training and the inherent dangers posed by divergence in data structure from multiple sources. To use the rapidly growing array of cloud data, including public cloud information and corporate internal information from hybrid clouds, you must aggregate data in a normalized way so it can be available for model training and processing with ML algorithms. As data volumes and data diversity increase, this becomes increasingly difficult.

Read More

Spotlight

Databeacon Inc

Cognos Incorporated (Cognos) operates provides business intelligence (BI) and performance management (PM) software solutions. The Company’s solutions help organizations plan, understand, and manage financial and operational performance. Cognos’ integrated solutions consist of its BI components, PM solutions and analytical applications. These components are supported by software services for administration, deployment, integration and extraction, transformation, and loading (ETL)…

Events