BIG DATA MANAGEMENT
Datafold | June 23, 2022
Datafold, a data reliability company, today announced data-diff, a new open source cross-database diffing package. This new product is an open source extension to Datafold’s original Data Diff tool for comparing data sets. Open source data-diff validates the consistency of data across databases using high-performance algorithms.
In the modern data stack, companies extract data from sources, load that data into a warehouse, and transform that data so that it can be used for analysis, activation, or data science use cases. Datafold has been focused on automated testing during the transformation step with Data Diff, ensuring that any change made to a data model does not break a dashboard or cause a predictive algorithm to have the wrong data. With the launch of open source data-diff, Datafold can now help with the extract and load part of the process. Open source data-diff verifies that the data that has been loaded matches the source of that data where it was extracted. All parts of the data stack need testing for data engineers to create reliable data products, and Datafold now gives them coverage throughout the extract, load, transform (ELT) process.
“data-diff fulfills a need that wasn’t previously being met. “Every data-savvy business today replicates data between databases in some way, for example, to integrate all available data in a warehouse or data lake to leverage it for analytics and machine learning. Replicating data at scale is a complex and often error-prone process, and although multiple vendors and open source tools provide replication solutions, there was no tooling to validate the correctness of such replication. As a result, engineering teams resorted to manual one-off checks and tedious investigations of discrepancies, and data consumers couldn’t fully trust the data replicated from other systems.
Gleb Mezhanskiy, Datafold founder and CEO
Mezhanskiy continued, “data-diff solves this problem elegantly by providing an easy way to validate consistency of data sets across databases at scale. It relies on state-of-the art algorithms to achieve incredible speed: e.g., comparing one-billion-row data sets across different databases takes less than five minutes on a regular laptop. And, as an open source tool, it can be easily embedded into existing workflows and systems.”
Answering an Important Need
Today’s organizations are using data replication to consolidate information from multiple sources into data warehouses or data lakes for analytics. They’re integrating operational systems with real-time data pipelines, consolidating data for search, and migrating data from legacy systems to modern databases.
Thanks to amazing tools like Fivetran, Airbyte and Stitch, it’s easier than ever to sync data across multiple systems and applications. Most data synchronization scenarios call for 100% guaranteed data integrity, yet the practical reality is that in any interconnected system, records are sometimes lost due to dropped packets, general replication issues, or configuration errors. To ensure data integrity, it’s necessary to perform validation checks using a data diff tool.
Datafold’s approach constitutes a significant step forward for developers and data analysts who wish to compare multiple databases rapidly and efficiently, without building a makeshift diff tool themselves. Currently, data engineers use multiple comparison methods, ranging from simple row counts to comprehensive row-level analysis. The former is fast but not comprehensive, whereas the latter approach is slow but guarantees complete validation. Open source data-diff is fast and provides complete validation.
Open Source data-diff for Building and Managing Data Quality
Available today, data-diff uses checksums to verify 100% consistency between two different data sources quickly and efficiently. This method allows for a row-level comparison of 100 million records to be done in just a few seconds, without sacrificing the granularity of the resulting comparison.
Datafold has released data-diff under the MIT license. Currently, the software includes connectors for Postgres, MySQL, Snowflake, BigQuery, Redshift, Presto and Oracle. Datafold plans to invite contributors to build connectors for additional data sources and for specific business applications.
Datafold is a data reliability platform that helps data teams deliver reliable data products faster. It has a unique ability to identify, prioritize and investigate data quality issues proactively before they affect production. Founded in 2020 by veteran data engineers, Datafold has raised $22 million from investors including NEA, Amplify Partners, and YCombinator. Customers include Thumbtack, Patreon, Truebill, Faire, and Dutchie.
BIG DATA MANAGEMENT
Synopsys | June 03, 2022
Driving greater design productivity by harnessing previously untapped design insights with machine learning technology, Synopsys, Inc. (Nasdaq: SNPS) today announced a critical expansion of its EDA data analytics portfolio with the introduction of Synopsys DesignDash design optimization solution. As a complementary product to Synopsys' market-leading Digital Design Family and Synopsys DSO.ai™, the award-winning AI-driven design-space-optimization solution, Synopsys DesignDash is a comprehensive data-visibility and machine intelligence-guided design optimization solution that enables unmatched productivity in advanced SoC design. The Synopsys DesignDash solution delivers a real-time, unified, 360-degree view of all design activities for faster decision making, a deeper understanding of run-to-run, design-to-design and project-to-project trends, and enhanced collaboration in the SoC development process.
"As a leading supplier of SoCs that are powering and transforming numerous high-impact industries, we pride ourselves on being able to push the limits of achievable device performance while also accelerating our customers' time-to-market," said Hiroshi Ikeda, director, Methodology Development Office, Global Development Group at Socionext. "We're very excited by the Synopsys DesignDash analytics solution as a systematic way to capture, consume and evaluate our vast design activity in a scalable way, enabling us to share and transfer expert knowledge across our worldwide design teams to enhance productivity and efficiency."
Unlocking the Potential Within Vast Volumes of Digital Design Data
The digital design flow holds a wealth of information from myriad sources that, properly utilized, could help teams optimize increasingly complex designs faster. According to Gartner® Inc., "By 2023, overall analytics adoption will increase from 35% to 50%, driven by vertical- and domain-specific augmented analytics solutions."1.
The introduction of Synopsys DesignDash is the latest step in a multi-year, multi-disciplinary development effort to address the need for exponential gains in design productivity in the face of massive growth in system complexity, shrinking market windows and an increasingly challenging resource landscape.
The cloud-optimized Synopsys DesignDash design optimization solution greatly enhances design productivity by:
Providing extensive real-time design status through powerful visualizations and interactive dashboards.
Deploying deep analytics and machine learning to extract and reveal actionable understanding from vast volumes of structured and unstructured EDA metrics and tool-flow data.
Quickly classifying design trends, identifying design limitations, providing guided root-cause analysis and delivering flow consumable, prescriptive resolutions.
With deeper design insights, designers can achieve more effective debug and optimization workflows, realize improved quality of results (QoR) and significantly extend overall design- and project-flow efficiency and effectiveness. This extensive insight and real-time visibility additionally deliver comprehensive resource monitoring and tracking that spans all design activities, enabling more data-driven management and risk mitigation throughout the design process. Synopsys DesignDash is natively integrated with the Synopsys Digital Design family of products for seamless data capture, resulting in insights that further accelerate the path towards design closure. The solution complements the Synopsys SiliconDash product, part of the Synopsys Silicon Lifecycle Management Family, forming a pre-silicon to post-silicon data continuum, maximizing opportunities for valuable data analysis across the complete design-to-silicon lifecycle.
"SoC complexity across all application niches continues to rise as more functionality and performance is required. "Through the data analytics and machine learning capabilities of the Synopsys DesignDash technology, engineering teams now have an efficient way to share and utilize valuable insights that would otherwise take hours of manual work to compile or, in some cases, not be accessible."
Karl Freund, founder, and principal analyst at Cambrian-AI Research
"The semiconductor industry needs a dramatic improvement in design process productivity. Improving the quality and speed of engineering decisions with a comprehensive EDA data analytics platform is a critical step in this direction," said Sanjay Bali, vice president of Marketing and Strategy for the Silicon Realization Group at Synopsys. "Synopsys DesignDash unlocks the potential of the significant and growing volumes of EDA metrics and design-flow data, heralding a new era in smarter IC design by deploying an expanse of advanced data analytics and targeted machine learning to effectively guide design teams to achieve or exceed their product goals and schedules."
Synopsys, Inc. is the Silicon to Software™ partner for innovative companies developing the electronic products and software applications we rely on every day. As an S&P 500 company, Synopsys has a long history of being a global leader in electronic design automation (EDA) and semiconductor IP and offers the industry's broadest portfolio of application security testing tools and services. Whether you're a system-on-chip (SoC) designer creating advanced semiconductors, or a software developer writing more secure, high-quality code, Synopsys has the solutions needed to deliver innovative products.
BIG DATA MANAGEMENT
Factored | March 10, 2022
Factored, a leader in data-centric AI helping tech unicorns and other high-profile tech companies select, upskill and build high-caliber data engineering, machine learning and data analytics teams, announced today that it has partnered with Databricks, the data and AI company, to drive business value for clients by unifying all data and artificial intelligence processes and workflows in a single platform. Thanks to Databricks' technology, including Delta Lake, Structured Streaming and the integration with MLflow, Factored engineers and analysts are providing innovative businesses with easier access to critical data driving key business decisions and strategy.
As a result of the partnership, Factored engineers and analysts can integrate all data-related processes in one platform to carry out tasks such as request parallelization, distance calculation and model interpretability. The partnership enhances cross-functional collaboration, visibility and efficiency in decision making and solution implementation for Factored's clients.
Databricks' Lakehouse Platform helps organizations accelerate innovation by unifying data teams with an open, scalable platform for all of their data-driven use cases. From streaming analytics and AI to BI, Databricks provides a modern lakehouse architecture that unifies data engineering, data science, machine learning and analytics within a single collaborative platform.
"We're delighted to be recognized as a Databricks Consulting Partner and to continue helping businesses make sense of their data using the industry's most cutting-edge tools. At Factored, we're dedicated to implementing the most effective data and AI solutions for our clients and Databricks' Lakehouse Platform plays a significant role in helping us achieve this."
Israel Niezen, Factored CEO
Since its founding in 2019, Factored has seen fast-paced growth and today is one of the biggest data science companies in Latin America.
Factored helps leading tech companies select, upskill and build world-class data science, machine learning and AI engineering teams much faster and more cost-effectively. Factored engineers have been personally vetted, educated and mentored by some of the most talented and recognized AI educators and engineers from Silicon Valley, Stanford University and deeplearning.ai.
subex | November 09, 2020
Subex, a pioneer in the space of Digital Trust, today announced its consolidated financial results for the quarter ended September 30, 2020.
Performance Highlights for the quarter ended September 30, 2020:
Revenue for the quarter at INR 933 million as against INR 857 million in Q2FY20
EBITDA for the quarter at INR 254 million as against INR 207 million in Q2FY20
EBITDA excluding Forex gains/losses for the quarter at INR 298 million as against INR 178 million in Q2FY20
Profit after Tax (PAT) for the quarter at INR 123 million as against INR 63 million in Q2FY20
Vinod Kumar, Managing Director & CEO, Subex, said, “During a period where the industry as a whole is continuing to come to grips with the ‘new normal’, we are very pleased to report both revenue and profit growth for the second quarter of FY21. Market traction, along with our singular focus on Digital Trust, is slowly building the growth momentum.
Despite the challenges of remote working, we are performing well on delivery and operations. With the recent appointment of our new CTO, we have augmented the management bandwidth required to drive an exciting product roadmap intended to expand our digital trust portfolio. We will also be strengthening our engagement with global strategic partners around our new solutions and technology areas such as blockchain, AI/ML to drive wider market adoption.“
Commenting on the results, Anil Singhvi, Chairman of the board, said, “Despite business challenges globally due to Pandemic, Subex has done very well. Continued good performance on all fronts, cleaned up balance sheet and improved free cash flows, are enabling us to invest in newer businesses and service our reduced capital base well”
Highlights of the Quarter
Launched Partner Ecosystem Management which will allow CSPs to accelerate their digital services portfolio expansion
Listed as a Sample Provider for Augmented Analytics in Gartner’s Emerging Technologies and Trends Impact Radar for Artificial Intelligence in Telecom report
Selected by a Tier-I Middle East operator for ROC Revenue Assurance and ROC Fraud Management
Secured a new deal with a regulatory body in Africa to validate the revenues reported by operators in the region and the associated license fees
Update on Capital Reduction
Completed the capital reduction process
Trading of new Subex shares commenced w.e.f. 5th November, 2020
Subex is a pioneer in enabling Digital Trust for businesses across the globe.
Founded in 1994, Subex has spent over 25 years in helping global Communications Service Providers maximize their revenues and profitability. With a legacy of having served the market through its world-class solutions for business optimization and analytics, Subex is now leading the way by enabling all-round Digital Trust in the business ecosystems of its customers. Focusing on privacy, security, risk mitigation, predictability, and confidence in data, Subex helps businesses embrace the disruptive changes in the business landscape and succeed with Digital Trust.
Subex leverages its award-winning product portfolio in areas such as Revenue Assurance, Fraud Management, Network Analytics, and Partner Management, and complements them through its digital solutions such as IoT Security and Insights. Subex also offers scalable Managed Services and Business Consulting services. Subex has more than 300 installations across 90+ countries.