clhash

C library implementing the ridiculously fast CLHash hashing function

Github星跟蹤圖

clhash

Build Status
Code Quality: Cpp

C library implementing the ridiculously fast CLHash hashing function (with C++ wrappers)

CLHash is a very fast hashing function that uses the
carry-less multiplication and SSE instructions.
Best used on recent x64 processors (Haswell or better).

CLHash has the following characteristics :

  • On a recent Intel processor (e.g., Skylake), it can hash input strings at a speed of 0.1 cycles per byte for sufficiently long strings. You read this right: it is simply ridiculously fast.
  • It has strong theoretical guarantees: XOR universality of short strings and excellent almost universality for longer strings.

For details, please see the research article:

Daniel Lemire, Owen Kaser, Faster 64-bit universal hashing using carry-less multiplications, Journal of Cryptographic Engineering 6 (3), 2016. http://arxiv.org/abs/1503.03465

Requirements

Please do not try to compile and run this software on legacy hardware (x64 processors
before Haswell), it will either fail to work or be slow. It should be able to port
the code to other architectures such as ARM or POWER processors but performance is
unknown at this point. And, yes, this means that CLHash is not portable hardware-wise.

If your compiler is not C99 compliant... please get better one.

If your applications requires hashing tiny strings, then you will not get a speed close to 0.1 cycles per byte.
The string should be significantly several times larger than a vector register (128 bits). So clhash is not
meant to be a general-purpose hash function.

Usage

make
./unit

Compile option: if you define BITMIX during compilation, extra work is done to
pass smhasher's avalanche test succesfully. Disabled by default.

Code sample

#include <assert.h>

#include "clhash.h"

int main() {
    void * random =  get_random_key_for_clhash(UINT64_C(0x23a23cf5033c3c81),UINT64_C(0xb3816f6a2c68e530));
    uint64_t hashvalue1 = clhash(random,"my dog",6);
    uint64_t hashvalue2 = clhash(random,"my cat",6);
    uint64_t hashvalue3 = clhash(random,"my dog",6);
    assert(hashvalue1 == hashvalue3);
    assert(hashvalue1 != hashvalue2);// very likely to be true
    free(random);
    return 0;
}

Simple benchmark

make
./benchmark

C++

If you prefer the convenience of a C++ interface with support for stl::vector and std::string,
you can create a clhasher object instead.

#include <assert.h>
#include <cstdio>
#include <vector>

#include "clhash.h"


int main(void) {
    clhasher h(UINT64_C(0x23a23cf5033c3c81),UINT64_C(0xb3816f6a2c68e530));
    std::vector<int> vec{1,3,4,5,2,24343};
    uint64_t vechash = h(vec);

    uint64_t arrayhash = h(vec.data(), vec.size());

    assert(vechash == arrayhash);

    uint64_t cstringhash = h("o hai wurld");
    uint64_t stringhash = h(std::string("o hai wurld"));
    assert(cstringhash == stringhash);
}
make
./cppunit

Credit for C++ wrapper: Daniel Baker

主要指標

概覽
名稱與所有者simdhash/clhash
主編程語言C
編程語言C (語言數: 3)
平台
許可證Apache License 2.0
所有者活动
創建於2016-04-25 13:08:28
推送於2024-04-04 02:34:39
最后一次提交2024-04-03 22:34:39
發布數1
最新版本名稱v0.1.0 (發布於 )
第一版名稱v0.1.0 (發布於 )
用户参与
星數276
關注者數15
派生數28
提交數25
已啟用問題?
問題數4
打開的問題數3
拉請求數3
打開的拉請求數0
關閉的拉請求數1
项目设置
已啟用Wiki?
已存檔?
是復刻?
已鎖定?
是鏡像?
是私有?