statistical-classifier

A PHP implementation of a Naive Bayes statistical classifier, including a structure for building other classifiers, multiple data sources and multiple caching backends.

  • 所有者: camspiers/statistical-classifier
  • 平台:
  • 许可证: MIT License
  • 分类:
  • 主题:
  • 喜欢:
    0
      比较:

Github星跟踪图

PHP Classifier

Build Status
Latest Stable Version

PHP Classifier uses semantic versioning, it is currently at major version 0, so the public API should not be considered stable.

What is it?

PHP Classifier is a text classification library with a focus on reuse, customizability and performance.
Classifiers can be used for many purposes, but are particularly useful in detecting spam.

Features

  • Complement Naive Bayes Classifier
  • SVM (libsvm) Classifier
  • Highly customizable (easily modify or build your own classifier)
  • Command-line interface via separate library (phar archive)
  • Multiple data import types to get your data into the classifier (Directory of files, Database queries, Json, Serialized arrays)
  • Multiple types of model caching
  • Compatible with HipHop VM

Installation

$ composer require camspiers/statistical-classifier

SVM Support

For SVM Support both libsvm and php-svm are required. For installation intructions refer to php-svm.

Usage

Non-cached Naive Bayes

use Camspiers\StatisticalClassifier\Classifier\ComplementNaiveBayes;
use Camspiers\StatisticalClassifier\DataSource\DataArray;

$source = new DataArray();
$source->addDocument('spam', 'Some spam document');
$source->addDocument('spam', 'Another spam document');
$source->addDocument('ham', 'Some ham document');
$source->addDocument('ham', 'Another ham document');

$classifier = new ComplementNaiveBayes($source);
$classifier->is('ham', 'Some ham document'); // bool(true)
$classifier->classify('Some ham document'); // string "ham"

Non-cached SVM

use Camspiers\StatisticalClassifier\Classifier\SVM;
use Camspiers\StatisticalClassifier\DataSource\DataArray;

$source = new DataArray()
$source->addDocument('spam', 'Some spam document');
$source->addDocument('spam', 'Another spam document');
$source->addDocument('ham', 'Some ham document');
$source->addDocument('ham', 'Another ham document');

$classifier = new SVM($source);
$classifier->is('ham', 'Some ham document'); // bool(true)
$classifier->classify('Some ham document'); // string "ham"

Caching models

Caching models requires maximebf/CacheCache which can be installed via packagist. Additional caching systems can be easily integrated.

Cached Naive Bayes

use Camspiers\StatisticalClassifier\Classifier\ComplementNaiveBayes;
use Camspiers\StatisticalClassifier\Model\CachedModel;
use Camspiers\StatisticalClassifier\DataSource\DataArray;

$source = new DataArray();
$source->addDocument('spam', 'Some spam document');
$source->addDocument('spam', 'Another spam document');
$source->addDocument('ham', 'Some ham document');
$source->addDocument('ham', 'Another ham document');

$model = new CachedModel(
	'mycachename',
	new CacheCache\Cache(
		new CacheCache\Backends\File(
			array(
				'dir' => __DIR__
			)
		)
	)
);

$classifier = new ComplementNaiveBayes($source, $model);
$classifier->is('ham', 'Some ham document'); // bool(true)
$classifier->classify('Some ham document'); // string "ham"

Cached SVM

use Camspiers\StatisticalClassifier\Classifier\SVM;
use Camspiers\StatisticalClassifier\Model\SVMCachedModel;
use Camspiers\StatisticalClassifier\DataSource\DataArray;

$source = new DataArray();
$source->addDocument('spam', 'Some spam document');
$source->addDocument('spam', 'Another spam document');
$source->addDocument('ham', 'Some ham document');
$source->addDocument('ham', 'Another ham document');

$model = new Model\SVMCachedModel(
	__DIR__ . '/model.svm',
	new CacheCache\Cache(
		new CacheCache\Backends\File(
			array(
				'dir' => __DIR__
			)
		)
	)
);

$classifier = new SVM($source, $model);
$classifier->is('ham', 'Some ham document'); // bool(true)
$classifier->classify('Some ham document'); // string "ham"

Unit testing

statistical-classifier/ $ composer install --dev
statistical-classifier/ $ phpunit

主要指标

概览
名称与所有者camspiers/statistical-classifier
主编程语言PHP
编程语言PHP (语言数: 1)
平台
许可证MIT License
所有者活动
创建于2012-09-29 22:18:06
推送于2016-09-26 20:55:35
最后一次提交2014-08-31 16:50:25
发布数22
最新版本名称0.8.0 (发布于 )
第一版名称0.1.0 (发布于 2013-03-15 11:23:54)
用户参与
星数173
关注者数21
派生数25
提交数287
已启用问题?
问题数14
打开的问题数5
拉请求数14
打开的拉请求数1
关闭的拉请求数4
项目设置
已启用Wiki?
已存档?
是复刻?
已锁定?
是镜像?
是私有?