Software & Apps

Clickbench says Postgres are a Great Dating of Analytics

TLDR:
We spent a few months to optimize postgreesqls and made it to Top 10 in Clickbencha benchmark commonly dominated by analytics databases specialized.

What else, all compute is inside postgres, and all tables are handled directly to postgreesql – it’s not a simple wrapper. This is the story of Pg_mooncake.

Clickbench

Clickbench is the definite benchmark for real-time analytics databases, originally designed to display the display at ClickHouse. It estimates the databases of their ability to handle real-world readsics, including high-volume table scans and complex desks on the table.

In history, clickhouse and other databases built on purpose established the leading of this benchmark, while the databases of general objective as postgres / mysql are mainly 100x. But we want to challenge that understanding – and postgres given.

How to build analytics in postgres?

If most people think of postgreesql, they think of a data-solid oltt stone data, not a true analysis of powerhouse. However PostgreSQL’s extensions make it unique to punch over the weight class. Here’s how we approach the challenge:

1. Build a PG Extension

We have moved extension to PG to build PG_MoonCake as a native PG extension.

2. Savings format: COLOUMNSTORE

For analytics workloads, a columnore format is important. Clickbench clickbench workloads typically include wide tables, but questions can only access a small subset of columns.

  • In a row store (such as postgresql heap table), reading a column means jumping in rows.
  • In a columnstore, reads sequentially, which is faster (and it also can make the best compression and killing the compressed data).

3. The implementation of the vectorized

To enhance the query question in question, we enter the DuckdB as the exectel enthine for columnstore questions. This means the entire murder tube, data is processed in batches instead of in the row, with more efficient sim, which is more epables in scanning, groups, and groups.

4. The Metadata & Management Records directly in Postgressa

Efficient handling metadata is critical for roughly analytics in real-time, because fixed objects. Instead of picking up metadata or statistics from storage formats such as parquet, we keep it straight to pg.

  • It makes the faster planning the question.
  • It also allows advanced features such as file skip, more performance improvement.

Many details of architecture.

What does this mean?

Postgreesql is no longer an OLTP workhorse. With careful tuning and engineering, the ability to deliver the paralysis of the partalics of the para with specialized databases while keeping flexibility and advertisement advantages in postgresql.

After building advanced data systems for a decade, about my core faith is: we can be the data stacker is simpler.

Pg_mooncake is licensed to the mit, so if you don’t believe it, give it attempt.

We launched v0.1 last week. And is now available Neon postgres and go to Support.

🥮


https://www.mooncake.dev/images/blog/clickbench-v0.1.jpg

2025-03-05 21:59:00

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button