Posts

  • Unnest Operator Performance Enhancement with Dictionary Blocks

    Queries with CROSS JOIN UNNEST clause are expected to have a significant performance improvement starting version 316.

  • A Report of First Ever Presto Conference Tokyo

    Nowadays, Presto is getting much attraction from the various kind of companies all around the world. Japan is not an exception. Many companies are using Presto as their primary data processing engine.

    To keep in touch with each other among the community members in Japan, we have just held the first ever Presto conference in Tokyo with welcoming Presto creators, Dain Sundstrom, Martin Traverso and David Phillips. The conference was hosted at the Tokyo office of Arm Treasure Data. This article is the summary of the conference aiming to convey the excitement in the room.

  • Introduction to Presto Cost-Based Optimizer

    The Cost-Based Optimizer (CBO) in Presto achieves stunning results in industry standard benchmarks (and not only in benchmarks)! The CBO makes decisions based on several factors, including shape of the query, filters and table statistics. I would like to tell you more about what the table statistics are in Presto and what information can be derived from them.

  • Dynamic filtering for highly-selective join optimization

    By using dynamic filtering via run-time predicate pushdown, we can significantly optimize highly-selective inner-joins.

  • Release 315

    This version adds support for FETCH FIRST ... WITH TIES syntax, locality-awareness to default scheduler for better workload balancing, the new format() function, and improved support for ORC bloom filters. Additionally, connectors can now provide view definitions, which opens up several new use cases.

  • Release 314

    This version adds support for reading ZSTD and LZ4-compressed Parquet data and writing ZSTD-compressed ORC data, improves compatibility with the Hive 2.3+ metastore, supports mixed-case field names in Elasticsearch, adds JSON output format for the CLI, and improves the rendering of the plan structure in EXPLAIN output.

  • Apache Phoenix Connector

    Presto 312 introduces a new Apache Phoenix Connector, which allows Presto to query data stored in HBase using Apache Phoenix. This unlocks new capabilities that previously weren’t possible with Phoenix alone, such as federation (querying of multiple Phoenix clusters) and joining Phoenix data with data from other Presto data sources.

  • Removing redundant ORDER BY

    Optimizers are all about doing work in the most cost-effective manner and avoiding unnecessary work. Some SQL constructs such as ORDER BY do not affect query results in many situations, and can negatively affect performance unless the optimizer is smart enough to remove them.

  • Release 313

    This version fixes incorrect results for queries involving GROUPING SETS and LIMIT, fixes selecting the UUID type from the CLI and JDBC driver, and adds support for compression and encryption when using Spill to Disk.

  • Using Precomputed Hash in SemiJoin Operations

    Queries involving IN and NOT IN over a subquery are much faster in Presto 312.

  • Release 312

    This version has many performance improvements (including cast optimization), a new UUID data type and uuid() function, a new Apache Phoenix connector, support for the PostgreSQL TIMESTAMP WITH TIME ZONE data type, support for the MySQL JSON data type, improved support for Hive bucketed tables, and some bug fixes.

  • Improved Hive Bucketing

    Presto 312 adds support for the more flexible bucketing introduced in recent versions of Hive. Specifically, it allows any number of files per bucket, including zero. This allows inserting data into an existing partition without having to rewrite the entire partition, and improves the performance of writes by not requiring the creation of files for empty buckets.

  • Optimizing the Casts Away

    The next release of Presto (version 312) will include a new optimization to remove unnecessary casts which might have been added implicitly by the query planner or explicitly by users when they wrote the query.

  • Presto Summit 2019 @TwitterSF

    Next month will mark the 2nd annual Presto Summit hosted by the Presto Software Foundation, Starburst Data, and Twitter. Last year’s event was a great success (see the Presto Summit 2018 recap).

  • Release 311

    This version adds standard OFFSET syntax, a new function combinations() for computing k-combinations of array elements, and support for nested collections in Cassandra.

  • Presto Community Meeting 2019-05-08

    Agenda

    • Existing function support
    • Function namespaces
    • Connector-resolved functions
    • SQL-defined functions
    • Remote functions
    • Polymorphic table functions
  • Faster S3 Reads

    Presto is known for working well with Amazon S3. We recently made an improvement that greatly reduces network utilization and latency when reading ORC or Parquet data.

  • Release 310

    This version adds standard FETCH FIRST syntax, support for using an alternate AWS role when accessing S3 or Glue, and improved handling of DECIMAL, DOUBLE, and REAL when Hive table and partition metadata differ.

  • A review of the first international Presto Conference, Tel Aviv, April 2019

    Community, noun: “A feeling of fellowship with others, as a result of sharing common attributes, interests, and goals”

    The fun picture you see here was taken at the first lecture of the First international Presto summit in Israel last month.

    The atmosphere in the room during the various presentations was unique. It’s as if you could physically feel the brainpower of 250 engineers fascinated by technology in one room.

    We would like to share with you a bit of the content that was discussed during the conference. Enjoy the read and the videos!

  • Release 309

    This version adds support for case-insensitive name matching in JDBC-based connectors, more data types in PostgreSQL connector, and some bug fixes.

  • Even Faster ORC

    Presto is known for being the fastest SQL on Hadoop engine, and our custom ORC reader implementation is a big reason for this speed – now it is even faster!

  • Release 308

    This version includes significant performance improvements when reading ORC data, authorization checks for SHOW COLUMNS, and limit pushdown for JDBC-based connectors.

  • Release 307

    This version includes some important security fixes, support for inner and outer joins involving lateral derived tables (LATERAL), new syntax for setting table comments, and performance improvements.

  • Presto Community Meeting 2019-04-03

    Agenda

    • Memory management
    • Spilling
  • Release 306

    This version includes some bug fixes, as well as performance improvements when decoding ORC data.

  • Presto Community Meeting 2019-03-13

    Agenda

    • Dynamic Filtering
    • Changes to TIMESTAMP semantics
  • Release 305

    Changes in this version include peak-memory awareness in cost-based optimizer, improved handling of CSV output in CLI, and performance improvements for Parquet.

  • Release 304

    New features include spilling for queries that use ORDER BY or window functions, support for PostgreSQL’s json and jsonb types, and a Hive procedure to synchronize partition metadata with the file system.

  • Presto Community Meeting 2019-02-27

    Agenda

    • Pushdown of complex operations (filter, project, join, etc.)
    • Coordinator high availability
  • Release 303

    This version includes bug fixes and performance improvements.

  • Release 302

    New features include native support for Google Cloud Storage and a connector for Elasticsearch.

  • Presto Community Meeting 2019-02-06

    Agenda

    • About the Foundation
    • Getting involved
    • Summary of new features
    • Top requested features
    • Release verification
  • Release 301

    New features include role-based access control and role management, invoker security mode for views, and ANALYZE syntax for collecting table statistics.

  • Presto Software Foundation Launch

    We are pleased to announce the launch of the Presto Software Foundation, a not-for-profit organization dedicated to the advancement of the Presto open source distributed SQL engine. The foundation is committed to ensuring the project remains open, collaborative and independent for decades to come.

subscribe via RSS