Qubole organized the first ever Presto Summit in India on September 05, 2019. Bangalore, as the technology and startup hub of India was the perfect venue for India’s first Presto Summit. Presto has seen a lot of interest and adoption in this (south asia and asia pacific) region, as was evident with the turnout in the last two Presto Meetups organized by Qubole over the past year. Courtyard By Marriott, on Outer Ring Road (ORR) - a 17 KM stretch that hosts 10% of Bangalore’s working population (around 1 million people), as the conference venue proved to be an ideal destination for Presto enthusiasts, several of whom, work in its immediate vicinity.
With 150 attendees from more than 75 companies, Presto community in India was super excited and eager to meet and interact with Presto co-creators - Martin Traverso, Dain Sundstrom and David Phillips, who flew down to Bangalore for this Event.
Queries with CROSS JOIN UNNEST
clause are expected to have a significant performance improvement starting version 316.
Nowadays, Presto is getting much attraction from the various kind of companies all around the world. Japan is not an exception. Many companies are using Presto as their primary data processing engine.
To keep in touch with each other among the community members in Japan, we have just held the first ever Presto conference in Tokyo with welcoming Presto creators, Dain Sundstrom, Martin Traverso and David Phillips. The conference was hosted at the Tokyo office of Arm Treasure Data. This article is the summary of the conference aiming to convey the excitement in the room.
The Cost-Based Optimizer (CBO) in Presto achieves stunning results in industry standard benchmarks (and not only in benchmarks)! The CBO makes decisions based on several factors, including shape of the query, filters and table statistics. I would like to tell you more about what the table statistics are in Presto and what information can be derived from them.
By using dynamic filtering via run-time predicate pushdown, we can significantly optimize highly-selective inner-joins.
This version adds support for
FETCH FIRST ... WITH TIES
syntax, locality-awareness to default scheduler for better workload balancing, the new
format()
function,
and improved support for ORC bloom filters. Additionally, connectors can now provide
view definitions, which opens up several new use cases.
This version adds support for reading ZSTD and LZ4-compressed Parquet data
and writing ZSTD-compressed ORC data, improves compatibility with the Hive
2.3+ metastore, supports mixed-case field names in Elasticsearch, adds JSON
output format for the CLI, and improves the rendering of the plan structure
in EXPLAIN
output.
Presto 312 introduces a new Apache Phoenix Connector, which allows Presto to query data stored in HBase using Apache Phoenix. This unlocks new capabilities that previously weren’t possible with Phoenix alone, such as federation (querying of multiple Phoenix clusters) and joining Phoenix data with data from other Presto data sources.
Optimizers are all about doing work in the most cost-effective manner and avoiding unnecessary work.
Some SQL constructs such as ORDER BY
do not affect query results in many situations, and can negatively
affect performance unless the optimizer is smart enough to remove them.
This version fixes incorrect results for queries involving GROUPING SETS
and LIMIT
, fixes selecting the UUID
type from the CLI and JDBC driver,
and adds support for compression and encryption when using
Spill to Disk.
Queries involving IN
and NOT IN
over a subquery are much faster in
Presto 312.
This version has many performance improvements (including
cast optimization),
a new UUID data type
and uuid()
function,
a new Apache Phoenix connector,
support for the PostgreSQL TIMESTAMP WITH TIME ZONE
data type,
support for the MySQL JSON
data type,
improved support for Hive bucketed tables,
and some bug fixes.
Presto 312 adds support for the more flexible bucketing introduced in recent versions of Hive. Specifically, it allows any number of files per bucket, including zero. This allows inserting data into an existing partition without having to rewrite the entire partition, and improves the performance of writes by not requiring the creation of files for empty buckets.
The next release of Presto (version 312) will include a new optimization to remove unnecessary casts which might have been added implicitly by the query planner or explicitly by users when they wrote the query.
Next month will mark the 2nd annual Presto Summit hosted by the Presto Software Foundation, Starburst Data, and Twitter. Last year’s event was a great success (see the Presto Summit 2018 recap).
This version adds standard
OFFSET
syntax, a new function
combinations()
for computing k-combinations of array elements,
and support for nested collections in Cassandra.
Presto is known for working well with Amazon S3. We recently made an improvement that greatly reduces network utilization and latency when reading ORC or Parquet data.
This version adds standard
FETCH FIRST
syntax, support for using an
alternate AWS role
when accessing S3 or Glue, and improved handling of DECIMAL
, DOUBLE
, and REAL
when Hive table and partition metadata differ.
Community, noun: “A feeling of fellowship with others, as a result of sharing common attributes, interests, and goals”
The fun picture you see here was taken at the first lecture of the First international Presto summit in Israel last month.
The atmosphere in the room during the various presentations was unique. It’s as if you could physically feel the brainpower of 250 engineers fascinated by technology in one room.
We would like to share with you a bit of the content that was discussed during the conference. Enjoy the read and the videos!
This version adds support for case-insensitive name matching in JDBC-based connectors, more data types in PostgreSQL connector, and some bug fixes.
Presto is known for being the fastest SQL on Hadoop engine, and our custom ORC reader implementation is a big reason for this speed – now it is even faster!
This version includes significant
performance improvements
when reading ORC data, authorization checks for
SHOW COLUMNS
,
and limit pushdown for JDBC-based connectors.
This version includes some important security fixes, support for inner and outer
joins involving lateral derived tables (LATERAL
),
new syntax for setting table comments, and performance
improvements.
This version includes some bug fixes, as well as performance improvements when decoding ORC data.
Changes in this version include peak-memory awareness in cost-based optimizer, improved handling of CSV output in CLI, and performance improvements for Parquet.
New features include spilling for queries that use ORDER BY or window functions, support for PostgreSQL’s json and jsonb types, and a Hive procedure to synchronize partition metadata with the file system.
This version includes bug fixes and performance improvements.
New features include native support for Google Cloud Storage and a connector for Elasticsearch.
New features include role-based access control and
role management,
invoker security
mode for views, and ANALYZE
syntax for collecting table statistics.
We are pleased to announce the launch of the Presto Software Foundation, a not-for-profit organization dedicated to the advancement of the Presto open source distributed SQL engine. The foundation is committed to ensuring the project remains open, collaborative and independent for decades to come.