xorfilter

实现二进制熔断器和 xor 过滤器的 Go 库。「Go library implementing binary fuse and xor filters」

  • 所有者: FastFilter/xorfilter
  • 平台: Linux,Mac,Windows
  • 許可證: Apache License 2.0
  • 分類:
  • 主題:
  • 喜歡:
    0
      比較:

Github星跟蹤圖

xorfilter: Go library implementing xor and binary fuse filters

GoDoc
Test

Bloom filters are used to quickly check whether an element is part of a set.
Xor and binary fuse filters are a faster and more concise alternative to Bloom filters.
Furthermore, unlike Bloom filters, xor and binary fuse filters are naturally compressible using standard techniques (gzip, zstd, etc.).
They are also smaller than cuckoo filters. They are used in production systems.

This Go library is used by

We are assuming that your set is made of 64-bit integers. If you have strings
or other data structures, you need to hash them first to a 64-bit integer. It
is not important to have a good hash function, but collision should be unlikely
(~1/2^64). A few collisions are acceptable, but we expect that your initial set
should have no duplicated entry.

The current implementation has a false positive rate of about 0.4% and a memory usage
of less than 9 bits per entry for sizeable sets.

You construct the filter as follows starting from a slice of 64-bit integers:

filter,_ := xorfilter.PopulateBinaryFuse8(keys) // keys is of type []uint64

It returns an object of type BinaryFuse8. The 64-bit integers would typically be hash values of your objects.

You can then query it as follows:

filter.Contains(v) // v is of type uint64

It will always return true if v was part of the initial construction (Populate) and almost always return false otherwise.

An xor filter is immutable, it is concurrent. The expectation is that you build it once and use it many times.

Though the filter itself does not use much memory, the construction of the filter needs many bytes of memory per set entry.

For persistence, you only need to serialize the following data structure:

type BinaryFuse8 struct {
	Seed               uint64
	SegmentLength      uint32
	SegmentLengthMask  uint32
	SegmentCount       uint32
	SegmentCountLength uint32
	Fingerprints []uint8
}

When constructing the filter, you should ensure that there are not too many duplicate keys for best results.

Generic (8-bit, 16-bit, 32-bit)

By default, we use 8-bit fingerprints which provide a 0.4% false positive rate. Some user might want to reduce
this false positive rate at the expensive of more memory usage. For this purpose, we provide a generic type
(NewBinaryFuse[T]).

filter8, _ := xorfilter.NewBinaryFuse[uint8](keys) // 0.39% false positive rate, uses about 9 bits per key
filter16, _ := xorfilter.NewBinaryFuse[uint16](keys) // 0.0015% false positive rate, uses about 18 bits per key
filter32, _ := xorfilter.NewBinaryFuse[uint32](keys) // 2e-08% false positive rate, uses about 36 bits per key

The 32-bit fingerprints are provided but not recommended. Most users will want to use either the 8-bit or 16-bit fingerprints.

The Binary Fuse filters have memory usages of about 9 bits per key in the 8-bit case, 18 bits per key in the 16-bit case,
for sufficiently large sets (hundreds of thousands of keys). There is more per-key memory usage when the set is smaller.

Implementations of xor filters in other programming languages

主要指標

概覽
名稱與所有者FastFilter/xorfilter
主編程語言Go
編程語言Go (語言數: 1)
平台
許可證Apache License 2.0
所有者活动
創建於2019-12-16 21:15:56
推送於2025-02-13 19:12:25
最后一次提交2025-02-14 01:08:50
發布數7
最新版本名稱v0.2.1 (發布於 )
第一版名稱v0.1.0 (發布於 )
用户参与
星數686
關注者數19
派生數49
提交數118
已啟用問題?
問題數14
打開的問題數1
拉請求數24
打開的拉請求數0
關閉的拉請求數5
项目设置
已啟用Wiki?
已存檔?
是復刻?
已鎖定?
是鏡像?
是私有?