despacer

C library to remove white space from strings as fast as possible

Github stars Tracking Chart

despacer

Build Status

Fast C library to remove white space from strings (also called "strip white space")

We want to remove the space (' ') and the line feeds characters ('\n', '\r') from a string
as fast as possible. To avoid unnecessary allocations, we wish to do the processing in-place.

Let us consider any array of bytes representing a string in one of these encodings:

  • UTF-8
  • ASCII
  • Any of the 8-bit ASCII supersets such as Latin1

How fast can we go?

Blog post:
http://lemire.me/blog/2017/01/20/how-quickly-can-you-remove-spaces-from-a-string/

Usage:

make
./despacebenchmark

Note that clang seems to give better results than gcc.

Possible results...

$ ./despacebenchmark
pointer alignment = 16 bytes
memcpy(tmpbuffer,buffer,N):  0.111328 cycles / ops
countspaces(buffer, N):  3.687500 cycles / ops
despace(buffer, N):  5.337891 cycles / ops
faster_despace(buffer, N):  1.689453 cycles / ops
despace64(buffer, N):  2.429688 cycles / ops
despace_to(buffer, N, tmpbuffer):  5.585938 cycles / ops
avx2_countspaces(buffer, N):  0.367188 cycles / ops
avx2_despace(buffer, N):  3.990234 cycles / ops
avx2_despace_branchless(buffer, N):  0.593750 cycles / ops
avx2_despace_branchless_u2(buffer, N):  0.535156 cycles / ops
sse4_despace(buffer, N):  0.734375 cycles / ops
sse4_despace_branchless(buffer, N):  0.384766 cycles / ops
sse4_despace_branchless_u2(buffer, N):  0.380859 cycles / ops
sse4_despace_branchless_u4(buffer, N):  0.351562 cycles / ops
sse4_despace_trail(buffer, N):  1.142578 cycles / ops
sse42_despace_branchless(buffer, N):  0.763672 cycles / ops
sse42_despace_branchless_lookup(buffer, N):  0.673828 cycles / ops
sse42_despace_to(buffer, N,tmpbuffer):  1.703125 cycles / ops

This indicates how many cycles are used to despace one byte.

Related work

Main metrics

Overview
Name With Ownerlemire/despacer
Primary LanguageC
Program languageMakefile (Language Count: 6)
Platform
License:BSD 3-Clause "New" or "Revised" License
所有者活动
Created At2017-01-19 18:50:11
Pushed At2024-09-05 23:57:39
Last Commit At2024-09-05 19:57:38
Release Count0
用户参与
Stargazers Count152
Watchers Count10
Fork Count15
Commits Count114
Has Issues Enabled
Issues Count9
Issue Open Count3
Pull Requests Count9
Pull Requests Open Count3
Pull Requests Close Count3
项目设置
Has Wiki Enabled
Is Archived
Is Fork
Is Locked
Is Mirror
Is Private