libpopcnt

快速 C/C++ 位群计数库。『🚀 Fast C/C++ bit population count library』

Github星跟蹤圖

libpopcnt

Build Status
Build Status
Github Releases

libpopcnt.h is a header-only C/C++ library for counting the
number of 1 bits (bit population count) in an array as quickly as
possible using specialized CPU instructions i.e.
POPCNT,
AVX2,
AVX512,
NEON.
libpopcnt.h has been tested successfully using the GCC,
Clang and MSVC compilers.

The algorithms used in libpopcnt.h are described in the paper
Faster Population Counts using AVX2 Instructions
by Daniel Lemire, Nathan Kurz and Wojciech Mula (23 Nov 2016).

How it works

On x86 CPUs libpopcnt.h uses a combination of 4 different bit
population count algorithms:

  • For array sizes < 512 bytes an unrolled POPCNT algorithm
    is used.
  • For array sizes ≥ 512 bytes an AVX2 algorithm is used.
  • For array sizes ≥ 1024 bytes an AVX512 algorithm is used.
  • For CPUs without POPCNT instruction a portable
    integer algorithm is used.

Note that libpopcnt.h works on all CPUs, it checks at run-time
whether your CPU supports POPCNT, AVX2, AVX512 before using it
and it is also thread-safe.

C/C++ API

#include "libpopcnt.h"

/*
 * Count the number of 1 bits in the data array
 * @data: An array
 * @size: Size of data in bytes
 */
uint64_t popcnt(const void* data, uint64_t size);

Speedup

This benchmark shows the speedup of the 4 popcount algorithms
used on x86 CPUs compared to the basic lookup-8
popcount algorithm for different array sizes (in bytes).

libpopcnt.h automatically picks the fastest algorithm for
the given array size. This benchmark was run on an Intel Xeon
Platinum 8168 CPU with GCC 5.4.

CPU architectures

libpopcnt.h has hardware accelerated popcount algorithms for
the following CPU architectures:

For other CPU architectures a fast integer popcount algorithm is used.

How to compile

For GCC and Clang compilation does not require any special compiler flags (like
-mavx2)!

cc  -O3 program.c
c++ -O3 program.cpp

Microsoft Visual C++

Using the MSVC compiler libpopcnt.h will only use POPCNT
by default. You can enable AVX2 or AVX512 to get the best
performance but then your program requires a CPU with
AVX2 or AVX512 support.

# Enable AVX2
cl /O2 /arch:AVX2 program.cpp

# Enable AVX512
cl /O2 /arch:AVX512 program.cpp

Development

cmake .
make -j
make test

The above commands also build the benchmark program which is
useful for benchmarking libpopcnt.h. Below is a
usage example run on an Intel Xeon Platinum 8168 CPU from 2017:

# Usage: ./benchmark [array bytes] [iters]
./benchmark
Iters: 10000000
Array size: 16.00 KB
Algorithm: AVX512
Status: 100%
Seconds: 1.59
103.4 GB/s

Acknowledgments

The vectorized popcount algorithms used in libpopcnt.h have
originally been written by Wojciech Muła,
I just made a convenient and portable C/C++ library using these algorithms.

主要指標

概覽
名稱與所有者kimwalisch/libpopcnt
主編程語言C
編程語言C++ (語言數: 3)
平台
許可證BSD 2-Clause "Simplified" License
所有者活动
創建於2016-11-28 16:55:14
推送於2024-06-29 09:50:09
最后一次提交2024-06-29 11:50:04
發布數19
最新版本名稱v3.1 (發布於 )
第一版名稱v1.0 (發布於 )
用户参与
星數344
關注者數23
派生數39
提交數370
已啟用問題?
問題數6
打開的問題數0
拉請求數9
打開的拉請求數0
關閉的拉請求數4
项目设置
已啟用Wiki?
已存檔?
是復刻?
已鎖定?
是鏡像?
是私有?