pire

Perl Incompatible Regular Expressions library

  • 所有者: yandex/pire
  • 平台:
  • 許可證: Other
  • 分類:
  • 主題:
  • 喜歡:
    0
      比較:

Github星跟蹤圖

This is PIRE, Perl Incompatible Regular Expressions library.

This library is aimed at checking a huge amount of text against
relatively many regular expressions. Roughly speaking, it can just
check whether given text maches the certain regexp, but can do it
really fast (more than 400 MB/s on our hardware is common). Even more,
multiple regexps can be combined together, giving capability to
check the text against apx.10 regexps in a single pass (and mantaining
the same speed).

Since Pire examines each character only once, without any lookaheads
or rollbacks, spending about five machine instructions per each character,
it can be used even in realtime tasks.

On the other hand, Pire has very limited functionality (compared to
other regexp libraries). Pire does not have any Perlish conditional
regexps, lookaheads & backtrackings, greedy/nongreedy matches; neither
has it any capturing facilities.

Pire was developed in Yandex (http://company.yandex.ru/) as a part of its
web crawler.

More information can be found in README.ru (in Russian), which is
yet to be translated.

Please report bugs to dprokoptsev@yandex-team.ru or davenger@yandex-team.ru.

Quick Start

#include <stdio.h>
#include
#include <pire/pire.h>

Pire::NonrelocScanner CompileRegexp(const char* pattern)
{
// Transform the pattern from UTF-8 into UCS4
std::vectorPire::wchar32 ucs4;
Pire::Encodings::Utf8().FromLocal(pattern, pattern + strlen(pattern), std::back_inserter(ucs4));

return Pire::Lexer(ucs4.begin(), ucs4.end())
	.AddFeature(Pire::Features::CaseInsensitive())	// enable case insensitivity
	.SetEncoding(Pire::Encodings::Utf8())		// set input text encoding
	.Parse() 					// create an FSM 
	.Surround()					// PCRE_ANCHORED behavior
	.Compile<Pire::NonrelocScanner>();		// compile the FSM

}

bool Matches(const Pire::NonrelocScanner& scanner, const char* ptr, size_t len)
{
return Pire::Runner(scanner)
.Begin() // '^'
.Run(ptr, len) // the text
.End(); // '$'
// implicitly cast to bool
}

int main()
{
char re[] = "hello\s+w.+d$";
char str[] = "Hello world";

Pire::NonrelocScanner sc = CompileRegexp(re);

bool res = Matches(sc, str, strlen(str));

printf("String \"%s\" %s \"%s\"\n", str, (res ? "matches" : "doesn't match"), re);
	
return 0;

}

主要指標

概覽
名稱與所有者yandex/pire
主編程語言C++
編程語言C (語言數: 7)
平台
許可證Other
所有者活动
創建於2010-10-15 00:37:08
推送於2020-09-08 21:23:28
最后一次提交2020-06-13 20:53:04
發布數4
最新版本名稱release-0.0.6 (發布於 )
第一版名稱release-0.0.3 (發布於 )
用户参与
星數334
關注者數21
派生數30
提交數271
已啟用問題?
問題數20
打開的問題數11
拉請求數49
打開的拉請求數9
關閉的拉請求數8
项目设置
已啟用Wiki?
已存檔?
是復刻?
已鎖定?
是鏡像?
是私有?