Analytics software provider Startburst on Tuesday said it was adding data discoverability features to Startburst Galaxy, a managed Trino SQL query engine service.
Trino, formerly Presto SQL, is an open source, distributed SQL query engine for big data that allows users to query data from multiple data sources, including NoSQL databases, within a single query.
The updates, which were announced at AWS re:Invent 2022, will help enterprises streamline the traditional Extract, Transform, Load (ETL) process for curation of data products, thereby accelerating data querying, access and analytics, the company said.
Delivered as a managed service through Starburst Galaxy, these new discoverability features are addressing challenges associated with data lakes turning into data swamps—cluttered with disorganized data that presents significant challenges around accessibility and the ability to leverage the data for actionable insights.
The new features are designed to reduce time to discovery from hours to seconds, and laying the foundation for self-service data product curation, regardless of technical expertise.
Proliferation of IoT devices has been a big contributor to the disorganized data problem facing enterprises across the globe. There are expected to be 55.7 billion connected IoT devices by 2025, generating almost 80 zecloudttabytes of data (the equivalent of a billion terabytes, or a trillion gigabytes ), according to IDC.
“Enabling organizations to more efficiently discover the right data sets, Starburst Galaxy is helping reduce costs while getting more value out of their data,” said Justin Borgman, chairman and CEO at Starburst, in a press release.
New capabilities to aid self-service data product creation
The newer capabilities, which the company expects will help enterprises in creating self-service data product creation, include data discovery, schema discovery and granular access control.
As the name suggests, the data discovery feature helps in finding the sought after data set and its location. The feature enables metadata to be automatically populated with query history and context, providing key insights into how data is being used, the company said.
Schema discovery, on the other hand, is targeted at eliminating the “transform” aspect from the ETL process. The feature will allow enterprises to discover existing data sets across sources along with new data sets regardless of their location, Starburst said.
This means that data engineers loading data don’t need to consider the schemas beforehand, the company added.
The new Granular Access Control (GAC) feature is designed to enable enterprise data administrators to see and understand who has access to what data and how that data is being used, Starburst said, adding that this allows administrators to change permissions through policy as code to ensure security and risk reduction within a continuous deployment pipeline.
Starburst Galaxy is currently available on AWS, Microsoft Azure and GCP across various pricing plans.
The updates to Starburst Galaxy comes within a year of the company adding data products to its enterprise data and analytics platform, dubbed Starburst Enterprise. In September, the company added new capabilities to the platform to ease cross-cloud analytics.
Further at re:Invent 2022, the company announced support for implementation of data mesh architecture for AWS Lake Formation, a service to create data lakes, via Starburst Enterprise.
The data mesh concept embraces decentralized management and governance of heterogeneous, distributed data. The goal of data mesh architecture is to allow management and analysis of data regardless of where it resides—on-premises, public cloud or multicloud environments, or SQL or NoSQL databases.
As part of the support, Starburst will help its joint customers with AWS to implement the technical aspects of data mesh, the company said, adding that this will help enterprises generate maximum value out of hybrid deployments.
Copyright © 2022 IDG Communications, Inc.