Snowflake Uses Python to Support Teradata, Google BigQuery, and Amazon Redshift

Cloud-based data warehouse company Snowflake at its annual Snowflake Summit on Tuesday showed off a new set of tools and integrations to take on rival analytics and database companies such as Teradata, and services such as Google BigQuery and Amazon Redshift.

The new features, which involve data access tools and support for Python on the company’s Snowpark application development system, are aimed at data scientists, data engineers and developers, in the goal of accelerating application development, especially for machine learning programs.

Snowpark, launched a year ago, is a dataframe-like development environment designed to allow developers to evaluate their favorite serverless tools on Snowflake’s virtual warehouse compute engine. Python support is in public preview.

Advertising

Advertising

“Python is probably the most requested feature we hear from our customers,” said Christian Kleinerman, senior vice president of product at Snowflake.

The demand for Python makes sense, as it’s a language of choice for data science, analysts say.

Snowflake is catching up on this front as rivals such as Teradata, Google BigQuery and Vertica already have Python support,” said Doug Henschen, principal analyst at Constellation Research.

Snowflake also said it’s adding Streamlit integration for app development and iteration. Streamlit, which is an open source application framework in Python for machine learning and data science engineering teams to help visualize, modify and share data, was acquired by Snowflake in March.

The integration will allow users to remain within the Snowflake environment not only to access, secure and govern data, but also to develop data science applications to model and analyze data, said Tony Baer,​ ​analyst Director at dbInsights.

Snowflake launches Python-related integrations

Other Python-related tools and integrations unveiled at the summit include Snowflake Worksheets for Python, Large Memory Warehouses, and SQL Machine Learning.

Snowflake Worksheets for Python, which is in private preview, is designed to enable companies to develop pipelines, machine learning models and applications through the company’s web interface, dubbed Snowsight, the company said, adding that ‘it had capabilities such as code completion and custom logic generation.

In order to help data scientists and development teams perform memory-intensive operations such as feature engineering and model training on large datasets, the company said it is working on a feature calls Great Storehouses of Memory.

Currently in the development phase, Large Memory Warehouses provides support for Python libraries through integration with the Anaconda data science platform, Snowflake said.

“Several rivals are configurable to support large memory warehouses as well as Python functions and language support, so it’s Snowflake that meets market demands,” Henschen said.

Snowflake also offers SQL Machine Learning, starting with time series data, in private preview. The service helps companies integrate machine learning-based predictions and analytics into business intelligence applications and dashboards, the company said.

Many analytical database vendors, according to Henschen, have embedded machine learning models for in-database execution.

“The rationale behind Snowflake starting with the analysis of time series data is: [that it is] among the most popular machine learning analytics because it is about predicting future values ​​based on previously revealed values,” Henschen said, adding that time series analysis has many use cases in the financial sector.

Snowflake updates enable better access to data

With the logic that faster data access could lead to faster application development, Snowflake also introduced new features on Tuesday, including support for streaming data, Apache Iceberg tables in Snowflake, and external tables. for on-site storage.

Support for streaming data, which is in private preview, helps break down the boundaries between streaming and batch processing pipelines with Snowpipe, the company’s continuous data ingestion service.

The rationale for launching the feature, according to Henschen, is the high interest in supporting low latency options, including near real-time and real-time streaming, and most vendors in this market have ticked. streaming case.

“The feature gives engineering teams an integrated way to analyze the stream alongside historical data, so data engineers don’t have to tinker with something themselves. It’s a time saver,” Henschen said.

In order to meet the demand for more open source table formats, the company said it is developing Apache Iceberg Tables to work in its environment.

“Apache Iceberg is a very popular open source table format and it is rapidly gaining traction for data analytics platforms. Table formats such as Iceberg provide enhancements that help with consistency and scalable performance. Iceberg was also recently adopted by Google for its Big Lake offering,” Henschen said.

Meanwhile, in an effort to keep its customers engaged with the site while helping drive adoption of its cloud data platform, Snowflake introduced on-site storage of external tables. Currently in private preview, the tool allows users to access their data in on-site storage systems from companies such as Dell Technologies and Pure Storage, the company said.

“Snowflake had a ‘cloud only’ policy for some time, so they clearly had large, important customers who wanted a way to bring on-premises data into analytics without moving everything into Snowflake,” said Henchen.

Additionally, Henschen said competitors including Teradata, Vertica, and Yellowbrick offer on-premises deployment as well as hybrid and multicloud deployment.

Copyright © 2022 IDG Communications, Inc.

Leave a Comment