zson

ZSON is a PostgreSQL extension for transparent JSONB compression

Github星跟蹤圖

ZSON

ZSON Logo

About

ZSON is a PostgreSQL extension for transparent JSONB compression. Compression is
based on a shared dictionary of strings most frequently used in specific JSONB
documents (not only keys, but also values, array elements, etc).

In some cases ZSON can save half of your disk space and give you about 10% more
TPS. Memory is saved as well. See docs/benchmark.md.
Everything depends on your data and workload, though. Don't believe any
benchmarks, re-check everything on your data, configuration, hardware, workload
and PostgreSQL version.

ZSON was originally created in 2016 by Postgres Professional team:
researched and coded by Aleksander Alekseev; ideas, code review, testing,
etc by Alexander Korotkov and Teodor Sigaev.

See also discussions on pgsql-general@, Hacker News, Reddit
and HabraHabr.

Install

Build and install ZSON:

cd /path/to/zson/source/code
make
sudo make install

Run tests:

make installcheck

Connect to PostgreSQL:

psql my_database

Enable extension:

create extension zson;

Uninstall

Disable extension:

drop extension zson;

Uninstall ZSON:

cd /path/to/zson/source/code
sudo make uninstall

Usage

First ZSON should be trained on common data using zson_learn procedure:

zson_learn(
    tables_and_columns text[][],
    max_examples int default 10000,
    min_length int default 2,
    max_length int default 128,
    min_count int default 2
)

Example:

select zson_learn('{{"table1", "col1"}, {"table2", "col2"}}');

You can create a temporary table and write some common JSONB documents into it
manually or use the existing tables. The idea is to provide a subset of real
data. Let's say some document type is twice as frequent as another document
type. ZSON expects that there will be twice as many documents of the first type
as those of the second one in a learning set.

Resulting dictionary could be examined using this query:

select * from zson_dict;

Now ZSON type could be used as a complete and transparent replacement of JSONB
type:

zson_test=# create table zson_example(x zson);
CREATE TABLE

zson_test=# insert into zson_example values ('{"aaa": 123}');
INSERT 0 1

zson_test=# select x -> 'aaa' from zson_example;
-[ RECORD 1 ]-
?column?, 123

Migrating to a new dictionary

When a schema of JSONB documents evolves ZSON could be re-learned:

select zson_learn('{{"table1", "col1"}, {"table2", "col2"}}');

This time second dictionary will be created. Dictionaries are cached in memory
so it will take about a minute before ZSON realizes that there is a new
dictionary. After that old documents will be decompressed using the old
dictionary and new documents will be compressed and decompressed using the new
dictionary.

To find out which dictionary is used for a given ZSON document use zson_info
procedure:

zson_test=# select zson_info(x) from test_compress where id = 1;
-[ RECORD 1 ]---------------------------------------------------
zson_info, zson version = 0, dict version = 1, ...

zson_test=# select zson_info(x) from test_compress where id = 2;
-[ RECORD 1 ]---------------------------------------------------
zson_info, zson version = 0, dict version = 0, ...

If all ZSON documents are migrated to the new dictionary the old one could
be safely removed:

delete from zson_dict where dict_id = 0;

In general, it's safer to keep old dictionaries just in case. Gaining a few KB
of disk space is not worth the risk of losing data.

When it's a time to re-learn?

Unfortunately, it's hard to recommend a general approach.

A good heuristic could be:

select pg_table_size('tt') / (select count(*) from tt)

... i.e. average document size. When it suddenly starts to grow it's time to
re-learn.

However, developers usually know when they change a schema significantly. It's
also easy to re-check whether the current schema differs a lot from the original
one using zson_dict table.

主要指標

概覽
名稱與所有者postgrespro/zson
主編程語言C
編程語言Makefile (語言數: 3)
平台
許可證MIT License
所有者活动
創建於2016-09-30 14:53:56
推送於2023-04-14 20:26:36
最后一次提交2023-04-14 23:26:32
發布數1
最新版本名稱v1.1 (發布於 )
第一版名稱v1.1 (發布於 )
用户参与
星數552
關注者數37
派生數21
提交數66
已啟用問題?
問題數20
打開的問題數0
拉請求數6
打開的拉請求數0
關閉的拉請求數1
项目设置
已啟用Wiki?
已存檔?
是復刻?
已鎖定?
是鏡像?
是私有?