php-ext-trie-filter

php extension for spam word filter based on Double-Array Trie tree, it can detect if a spam word exists in a text message. 关键词过滤扩展,用于检查一段文本中是否出现敏感词,基于Double-Array Trie 树实现。

Github stars Tracking Chart

php-ext-trie-filter

php extension for spam word filter based on Double-Array Trie tree, it can detect if a spam word exists in a text message.

关键词过滤扩展,用于检查一段文本中是否出现敏感词,基于Double-Array Trie 树实现。

升级历史

2017-08-08

  1. 同时支持php5&php7
  2. 新增方法:
  3. trie_filter_read,从string中读取二进制字典数据
  4. trie_filter_write,将当前对象导出成二进制string
  5. trie_filter_delete,从当前对象中删除一个word

2013-06-23

  1. trie_filter_search_all,一次返回所有的命中词
  2. 修复内存泄露

依赖库

libdatrie-0.2.4 or later

安装步骤

下面的$LIB_PATH为依赖库安装目录,$INSTALL_PHP_PATH为PHP安装目录。

安装libdatrie

$ tar zxvf libdatrie-0.2.4.tar.gz
$ cd libdatrie-0.2.4
$ make clean
$ ./configure --prefix=$LIB_PATH
$ make
$ make install

安装扩展

$ $INSTALL_PHP_PATH/bin/phpize
$ ./configure --with-php-config=$INSTALL_PHP_PATH/bin/php-config --with-trie_filter=$LIB_PATH
$ make
$ make install

然后修改php.ini,增加一行:extension=trie_filter.so,然后重启PHP。

使用示例

<?php
$arrWord = array('word1', 'word2', 'word3');
$resTrie = trie_filter_new(); //create an empty trie tree
foreach ($arrWord as $k => $v) {
    trie_filter_store($resTrie, $v);
}
trie_filter_save($resTrie, __DIR__ . '/blackword.tree');

$resTrie = trie_filter_load(__DIR__ . '/blackword.tree');

$strContent = 'hello word2 word1';
$arrRet = trie_filter_search($resTrie, $strContent);
print_r($arrRet); //Array(0 => 6, 1 => 5)
echo substr($strContent, $arrRet[0], $arrRet[1]); //word2
$arrRet = trie_filter_search_all($resTrie, $strContent);
print_r($arrRet); //Array(0 => Array(0 => 6, 1 => 5), 1 => Array(0 => 12, 1 => 5))

$arrRet = trie_filter_search($resTrie, 'hello word');
print_r($arrRet); //Array()

trie_filter_free($resTrie);

PHP版本

PHP 5.2 ~ 7.1.

Windows is not support until now.

License

Apache License 2.0

致谢

本项目是在用于检测敏感词的 PHP 扩展的基础上改写的。

Main metrics

Overview
Name With Ownerwulijun/php-ext-trie-filter
Primary LanguageC
Program languageC (Language Count: 3)
Platform
License:
所有者活动
Created At2012-07-30 10:33:06
Pushed At2024-05-30 08:32:42
Last Commit At2019-02-13 20:55:41
Release Count0
用户参与
Stargazers Count512
Watchers Count37
Fork Count169
Commits Count25
Has Issues Enabled
Issues Count19
Issue Open Count12
Pull Requests Count5
Pull Requests Open Count0
Pull Requests Close Count2
项目设置
Has Wiki Enabled
Is Archived
Is Fork
Is Locked
Is Mirror
Is Private