Qubole organized the first ever Presto Summit in India on September 05, 2019. Bangalore, as the technology and startup hub of India was the perfect venue for India’s first Presto Summit. Presto has seen a lot of interest and adoption in this (south asia and asia pacific) region, as was evident with the turnout in the last two Presto Meetups organized by Qubole over the past year. Courtyard By Marriott, on Outer Ring Road (ORR) - a 17 KM stretch that hosts 10% of Bangalore’s working population (around 1 million people), as the conference venue proved to be an ideal destination for Presto enthusiasts, several of whom, work in its immediate vicinity.
With 150 attendees from more than 75 companies, Presto community in India was super excited and eager to meet and interact with Presto co-creators - Martin Traverso, Dain Sundstrom and David Phillips, who flew down to Bangalore for this Event.
CROSS JOIN UNNEST clause are expected to have a significant performance improvement starting version 316.
Nowadays, Presto is getting much attraction from the various kind of companies all around the world. Japan is not an exception. Many companies are using Presto as their primary data processing engine.
To keep in touch with each other among the community members in Japan, we have just held the first ever Presto conference in Tokyo with welcoming Presto creators, Dain Sundstrom, Martin Traverso and David Phillips. The conference was hosted at the Tokyo office of Arm Treasure Data. This article is the summary of the conference aiming to convey the excitement in the room.
The Cost-Based Optimizer (CBO) in Presto achieves stunning results in industry standard benchmarks (and not only in benchmarks)! The CBO makes decisions based on several factors, including shape of the query, filters and table statistics. I would like to tell you more about what the table statistics are in Presto and what information can be derived from them.
By using dynamic filtering via run-time predicate pushdown, we can significantly optimize highly-selective inner-joins.
This version adds support for
FETCH FIRST ... WITH TIES
syntax, locality-awareness to default scheduler for better workload balancing, the new
and improved support for ORC bloom filters. Additionally, connectors can now provide
view definitions, which opens up several new use cases.
This version adds support for reading ZSTD and LZ4-compressed Parquet data
and writing ZSTD-compressed ORC data, improves compatibility with the Hive
2.3+ metastore, supports mixed-case field names in Elasticsearch, adds JSON
output format for the CLI, and improves the rendering of the plan structure
Presto 312 introduces a new Apache Phoenix Connector, which allows Presto to query data stored in HBase using Apache Phoenix. This unlocks new capabilities that previously weren’t possible with Phoenix alone, such as federation (querying of multiple Phoenix clusters) and joining Phoenix data with data from other Presto data sources.
Optimizers are all about doing work in the most cost-effective manner and avoiding unnecessary work.
Some SQL constructs such as
ORDER BY do not affect query results in many situations, and can negatively
affect performance unless the optimizer is smart enough to remove them.
This version fixes incorrect results for queries involving
LIMIT, fixes selecting the
UUID type from the CLI and JDBC driver,
and adds support for compression and encryption when using
Spill to Disk.
NOT IN over a subquery are much faster in
This version has many performance improvements (including
a new UUID data type
a new Apache Phoenix connector,
support for the PostgreSQL
TIMESTAMP WITH TIME ZONE data type,
support for the MySQL
JSON data type,
improved support for Hive bucketed tables,
and some bug fixes.
Presto 312 adds support for the more flexible bucketing introduced in recent versions of Hive. Specifically, it allows any number of files per bucket, including zero. This allows inserting data into an existing partition without having to rewrite the entire partition, and improves the performance of writes by not requiring the creation of files for empty buckets.
The next release of Presto (version 312) will include a new optimization to remove unnecessary casts which might have been added implicitly by the query planner or explicitly by users when they wrote the query.
Presto is known for working well with Amazon S3. We recently made an improvement that greatly reduces network utilization and latency when reading ORC or Parquet data.
This version adds standard
syntax, support for using an
alternate AWS role
when accessing S3 or Glue, and improved handling of
when Hive table and partition metadata differ.
Community, noun: “A feeling of fellowship with others, as a result of sharing common attributes, interests, and goals”
The fun picture you see here was taken at the first lecture of the First international Presto summit in Israel last month.
The atmosphere in the room during the various presentations was unique. It’s as if you could physically feel the brainpower of 250 engineers fascinated by technology in one room.
We would like to share with you a bit of the content that was discussed during the conference. Enjoy the read and the videos!
This version adds support for case-insensitive name matching in JDBC-based connectors, more data types in PostgreSQL connector, and some bug fixes.
Presto is known for being the fastest SQL on Hadoop engine, and our custom ORC reader implementation is a big reason for this speed – now it is even faster!
This version includes some bug fixes, as well as performance improvements when decoding ORC data.
New features include spilling for queries that use ORDER BY or window functions, support for PostgreSQL’s json and jsonb types, and a Hive procedure to synchronize partition metadata with the file system.
This version includes bug fixes and performance improvements.
We are pleased to announce the launch of the Presto Software Foundation, a not-for-profit organization dedicated to the advancement of the Presto open source distributed SQL engine. The foundation is committed to ensuring the project remains open, collaborative and independent for decades to come.