First Presto Summit in India, Bangalore, September 2019

Qubole organized the first ever Presto Summit in India on September 05, 2019. Bangalore, as the technology and startup hub of India was the perfect venue for India’s first Presto Summit. Presto has seen a lot of interest and adoption in this (south asia and asia pacific) region, as was evident with the turnout in the last two Presto Meetups organized by Qubole over the past year. Courtyard By Marriott, on Outer Ring Road (ORR) - a 17 KM stretch that hosts 10% of Bangalore’s working population (around 1 million people), as the conference venue proved to be an ideal destination for Presto enthusiasts, several of whom, work in its immediate vicinity.

With 150 attendees from more than 75 companies, Presto community in India was super excited and eager to meet and interact with Presto co-creators - Martin Traverso, Dain Sundstrom and David Phillips, who flew down to Bangalore for this Event.

Unnest Operator Performance Enhancement with Dictionary Blocks

Queries with CROSS JOIN UNNEST clause are expected to have a significant performance improvement starting version 316.

A Report of First Ever Presto Conference Tokyo

Nowadays, Presto is getting much attraction from the various kind of companies all around the world. Japan is not an exception. Many companies are using Presto as their primary data processing engine.

To keep in touch with each other among the community members in Japan, we have just held the first ever Presto conference in Tokyo with welcoming Presto creators, Dain Sundstrom, Martin Traverso and David Phillips. The conference was hosted at the Tokyo office of Arm Treasure Data. This article is the summary of the conference aiming to convey the excitement in the room.

Introduction to Presto Cost-Based Optimizer

The Cost-Based Optimizer (CBO) in Presto achieves stunning results in industry standard benchmarks (and not only in benchmarks)! The CBO makes decisions based on several factors, including shape of the query, filters and table statistics. I would like to tell you more about what the table statistics are in Presto and what information can be derived from them.

Dynamic filtering for highly-selective join optimization

By using dynamic filtering via run-time predicate pushdown, we can significantly optimize highly-selective inner-joins.

Release 315

This version adds support for FETCH FIRST ... WITH TIES syntax, locality-awareness to default scheduler for better workload balancing, the new format() function, and improved support for ORC bloom filters. Additionally, connectors can now provide view definitions, which opens up several new use cases.

Release 314

This version adds support for reading ZSTD and LZ4-compressed Parquet data and writing ZSTD-compressed ORC data, improves compatibility with the Hive 2.3+ metastore, supports mixed-case field names in Elasticsearch, adds JSON output format for the CLI, and improves the rendering of the plan structure in EXPLAIN output.

Apache Phoenix Connector

Presto 312 introduces a new Apache Phoenix Connector, which allows Presto to query data stored in HBase using Apache Phoenix. This unlocks new capabilities that previously weren’t possible with Phoenix alone, such as federation (querying of multiple Phoenix clusters) and joining Phoenix data with data from other Presto data sources.

Removing redundant ORDER BY

Optimizers are all about doing work in the most cost-effective manner and avoiding unnecessary work. Some SQL constructs such as ORDER BY do not affect query results in many situations, and can negatively affect performance unless the optimizer is smart enough to remove them.

Release 313

This version fixes incorrect results for queries involving GROUPING SETS and LIMIT, fixes selecting the UUID type from the CLI and JDBC driver, and adds support for compression and encryption when using Spill to Disk.

Using Precomputed Hash in SemiJoin Operations

Queries involving IN and NOT IN over a subquery are much faster in Presto 312.

Release 312

This version has many performance improvements (including cast optimization), a new UUID data type and uuid() function, a new Apache Phoenix connector, support for the PostgreSQL TIMESTAMP WITH TIME ZONE data type, support for the MySQL JSON data type, improved support for Hive bucketed tables, and some bug fixes.

Improved Hive Bucketing

Presto 312 adds support for the more flexible bucketing introduced in recent versions of Hive. Specifically, it allows any number of files per bucket, including zero. This allows inserting data into an existing partition without having to rewrite the entire partition, and improves the performance of writes by not requiring the creation of files for empty buckets.

Optimizing the Casts Away

The next release of Presto (version 312) will include a new optimization to remove unnecessary casts which might have been added implicitly by the query planner or explicitly by users when they wrote the query.

Presto Summit 2019 @TwitterSF

Next month will mark the 2nd annual Presto Summit hosted by the Presto Software Foundation, Starburst Data, and Twitter. Last year’s event was a great success (see the Presto Summit 2018 recap).

Release 311

This version adds standard OFFSET syntax, a new function combinations() for computing k-combinations of array elements, and support for nested collections in Cassandra.

Presto Community Meeting 2019-05-08


Faster S3 Reads

Presto is known for working well with Amazon S3. We recently made an improvement that greatly reduces network utilization and latency when reading ORC or Parquet data.

Release 310

This version adds standard FETCH FIRST syntax, support for using an alternate AWS role when accessing S3 or Glue, and improved handling of DECIMAL, DOUBLE, and REAL when Hive table and partition metadata differ.

A review of the first international Presto Conference, Tel Aviv, April 2019

Community, noun: “A feeling of fellowship with others, as a result of sharing common attributes, interests, and goals”

The fun picture you see here was taken at the first lecture of the First international Presto summit in Israel last month.

The atmosphere in the room during the various presentations was unique. It’s as if you could physically feel the brainpower of 250 engineers fascinated by technology in one room.

We would like to share with you a bit of the content that was discussed during the conference. Enjoy the read and the videos!

Release 309

This version adds support for case-insensitive name matching in JDBC-based connectors, more data types in PostgreSQL connector, and some bug fixes.

Even Faster ORC

Presto is known for being the fastest SQL on Hadoop engine, and our custom ORC reader implementation is a big reason for this speed – now it is even faster!

Release 308

This version includes significant performance improvements when reading ORC data, authorization checks for SHOW COLUMNS, and limit pushdown for JDBC-based connectors.

Release 307

This version includes some important security fixes, support for inner and outer joins involving lateral derived tables (LATERAL), new syntax for setting table comments, and performance improvements.

Presto Community Meeting 2019-04-03


Release 306

This version includes some bug fixes, as well as performance improvements when decoding ORC data.

Presto Community Meeting 2019-03-13


Release 305

Changes in this version include peak-memory awareness in cost-based optimizer, improved handling of CSV output in CLI, and performance improvements for Parquet.

Release 304

New features include spilling for queries that use ORDER BY or window functions, support for PostgreSQL’s json and jsonb types, and a Hive procedure to synchronize partition metadata with the file system.

Presto Community Meeting 2019-02-27


Release 303

This version includes bug fixes and performance improvements.

Release 302

New features include native support for Google Cloud Storage and a connector for Elasticsearch.

Presto Community Meeting 2019-02-06


Release 301

New features include role-based access control and role management, invoker security mode for views, and ANALYZE syntax for collecting table statistics.

Presto Software Foundation Launch

We are pleased to announce the launch of the Presto Software Foundation, a not-for-profit organization dedicated to the advancement of the Presto open source distributed SQL engine. The foundation is committed to ensuring the project remains open, collaborative and independent for decades to come.

subscribe via RSS