bloomfilter

Face-meltingly fast, thread-safe, marshalable, unionable, probability- and optimal-size-calculating Bloom filter in go

  • 所有者: skull-squadron/bloomfilter
  • 平台:
  • 許可證: MIT License
  • 分類:
  • 主題:
  • 喜歡:
    0
      比較:

Github星跟蹤圖

Important: Zeroth, consider if a Cuckoo filter could be right for your use-case.

GoDoc travis

Face-meltingly fast, thread-safe, marshalable, unionable, probability- and optimal-size-calculating Bloom filter in go

Copyright © 2014-2016,2018 Barry Allard

MIT license

WTF is a bloom filter

**TL;DR: **Probabilistic, extra lookup table to track a set of elements kept elsewhere to reduce expensive, unnecessary set element retrieval and/or iterator operations when an element is not present in the set. It's a classic time-storage tradeoff algoritm.

Properties

See wikipedia for algorithm details, Impact, What, Description, ---, ---, ---, Good, No false negatives, know for certain if a given element is definitely NOT in the set, Bad, False positives, uncertain if a given element is in the set, Bad, Theoretical potential for hash collisions, in very large systems and/or badly hash.Hash64-conforming implementations, Bad, Add only, Cannot remove an element, it would destroy information about other elements, Good, Constant storage, uses only a fixed amount of memory, ## Naming conventions

(Similar to algorithm), Variable/function, Description, Range, ---, ---, ---, m/M(), number of bits in the bloom filter (memory representation is about m/8 bytes in size), >=2, n/N(), number of elements present, >=0, k/K(), number of keys to use (keys are kept private to user code but are de/serialized to Marshal and file I/O), >=0, maxN, maximum capacity of intended structure, >0, p, maximum allowed probability of collision (for computing m and k for optimal sizing), >0..<1, - Memory representation should be exactly 24 + 8*(k + (m+63)/64) + unsafe.Sizeof(RWMutex) bytes.

  • Serialized (BinaryMarshaler) representation should be exactly 72 + 8*(k + (m+63)/64) bytes. (Disk format is less due to compression.)

Binary serialization format

All values in Little-endian format, Offset, Offset (Hex), Length (bytes), Name, Type, ---, ---, ---, ---, ---, 00, 8, k, uint64, 8, 08, 8, n, uint64, 16, 10, 8, m, uint64, 24, 18, k, (keys), [k]uint64, 24+8*k, ..., (m+63)/64, (bloom filter), [(m+63)/64]uint64, 24+8*k+8*((m+63)/64), ..., 48, (SHA384 of all previous fields, hashed in order), [48]byte, - bloomfilter.Filter conforms to encoding.BinaryMarshaler and `encoding.BinaryUnmarshaler'

Usage


import "github.com/steakknife/bloomfilter"

const (
  maxElements = 100000
  probCollide = 0.0000001
)

bf, err := bloomfilter.NewOptimal(maxElements, probCollide)
if err != nil {
  panic(err)
}

someValue := ... // must conform to hash.Hash64

bf.Add(someValue)
if bf.Contains(someValue) { // probably true, could be false
  // whatever
}

anotherValue := ... // must also conform to hash.Hash64

if bf.Contains(anotherValue) {
  panic("This should never happen")
}

err := bf.WriteFile("1.bf.gz")  // saves this BF to a file
if err != nil {
  panic(err)
}

bf2, err := bloomfilter.ReadFile("1.bf.gz") // read the BF to another var
if err != nil {
  panic(err)
}

Design

Where possible, branch-free operations are used to avoid deep pipeline / execution unit stalls on branch-misses.

Get

go get -u github.com/steakknife/bloomfilter  # master is always stable

Source

Contact

License

MIT license

Copyright © 2014-2016 Barry Allard

主要指標

概覽
名稱與所有者skull-squadron/bloomfilter
主編程語言Go
編程語言Go (語言數: 1)
平台
許可證MIT License
所有者活动
創建於2014-06-20 05:52:19
推送於2018-09-22 17:46:47
最后一次提交2018-09-22 10:46:46
發布數26
最新版本名稱1.0.4 (發布於 2018-09-05 21:34:17)
第一版名稱0.0.0 (發布於 2014-06-19 22:54:07)
用户参与
星數353
關注者數12
派生數50
提交數88
已啟用問題?
問題數0
打開的問題數0
拉請求數3
打開的拉請求數0
關閉的拉請求數2
项目设置
已啟用Wiki?
已存檔?
是復刻?
已鎖定?
是鏡像?
是私有?