statistical-classifier

A PHP implementation of a Naive Bayes statistical classifier, including a structure for building other classifiers, multiple data sources and multiple caching backends.

  • Owner: camspiers/statistical-classifier
  • Platform:
  • License:: MIT License
  • Category::
  • Topic:
  • Like:
    0
      Compare:

Github stars Tracking Chart

PHP Classifier

Build Status
Latest Stable Version

PHP Classifier uses semantic versioning, it is currently at major version 0, so the public API should not be considered stable.

What is it?

PHP Classifier is a text classification library with a focus on reuse, customizability and performance.
Classifiers can be used for many purposes, but are particularly useful in detecting spam.

Features

  • Complement Naive Bayes Classifier
  • SVM (libsvm) Classifier
  • Highly customizable (easily modify or build your own classifier)
  • Command-line interface via separate library (phar archive)
  • Multiple data import types to get your data into the classifier (Directory of files, Database queries, Json, Serialized arrays)
  • Multiple types of model caching
  • Compatible with HipHop VM

Installation

$ composer require camspiers/statistical-classifier

SVM Support

For SVM Support both libsvm and php-svm are required. For installation intructions refer to php-svm.

Usage

Non-cached Naive Bayes

use Camspiers\StatisticalClassifier\Classifier\ComplementNaiveBayes;
use Camspiers\StatisticalClassifier\DataSource\DataArray;

$source = new DataArray();
$source->addDocument('spam', 'Some spam document');
$source->addDocument('spam', 'Another spam document');
$source->addDocument('ham', 'Some ham document');
$source->addDocument('ham', 'Another ham document');

$classifier = new ComplementNaiveBayes($source);
$classifier->is('ham', 'Some ham document'); // bool(true)
$classifier->classify('Some ham document'); // string "ham"

Non-cached SVM

use Camspiers\StatisticalClassifier\Classifier\SVM;
use Camspiers\StatisticalClassifier\DataSource\DataArray;

$source = new DataArray()
$source->addDocument('spam', 'Some spam document');
$source->addDocument('spam', 'Another spam document');
$source->addDocument('ham', 'Some ham document');
$source->addDocument('ham', 'Another ham document');

$classifier = new SVM($source);
$classifier->is('ham', 'Some ham document'); // bool(true)
$classifier->classify('Some ham document'); // string "ham"

Caching models

Caching models requires maximebf/CacheCache which can be installed via packagist. Additional caching systems can be easily integrated.

Cached Naive Bayes

use Camspiers\StatisticalClassifier\Classifier\ComplementNaiveBayes;
use Camspiers\StatisticalClassifier\Model\CachedModel;
use Camspiers\StatisticalClassifier\DataSource\DataArray;

$source = new DataArray();
$source->addDocument('spam', 'Some spam document');
$source->addDocument('spam', 'Another spam document');
$source->addDocument('ham', 'Some ham document');
$source->addDocument('ham', 'Another ham document');

$model = new CachedModel(
	'mycachename',
	new CacheCache\Cache(
		new CacheCache\Backends\File(
			array(
				'dir' => __DIR__
			)
		)
	)
);

$classifier = new ComplementNaiveBayes($source, $model);
$classifier->is('ham', 'Some ham document'); // bool(true)
$classifier->classify('Some ham document'); // string "ham"

Cached SVM

use Camspiers\StatisticalClassifier\Classifier\SVM;
use Camspiers\StatisticalClassifier\Model\SVMCachedModel;
use Camspiers\StatisticalClassifier\DataSource\DataArray;

$source = new DataArray();
$source->addDocument('spam', 'Some spam document');
$source->addDocument('spam', 'Another spam document');
$source->addDocument('ham', 'Some ham document');
$source->addDocument('ham', 'Another ham document');

$model = new Model\SVMCachedModel(
	__DIR__ . '/model.svm',
	new CacheCache\Cache(
		new CacheCache\Backends\File(
			array(
				'dir' => __DIR__
			)
		)
	)
);

$classifier = new SVM($source, $model);
$classifier->is('ham', 'Some ham document'); // bool(true)
$classifier->classify('Some ham document'); // string "ham"

Unit testing

statistical-classifier/ $ composer install --dev
statistical-classifier/ $ phpunit

Main metrics

Overview
Name With Ownercamspiers/statistical-classifier
Primary LanguagePHP
Program languagePHP (Language Count: 1)
Platform
License:MIT License
所有者活动
Created At2012-09-29 22:18:06
Pushed At2016-09-26 20:55:35
Last Commit At2014-08-31 16:50:25
Release Count22
Last Release Name0.8.0 (Posted on )
First Release Name0.1.0 (Posted on 2013-03-15 11:23:54)
用户参与
Stargazers Count173
Watchers Count21
Fork Count25
Commits Count287
Has Issues Enabled
Issues Count14
Issue Open Count5
Pull Requests Count14
Pull Requests Open Count1
Pull Requests Close Count4
项目设置
Has Wiki Enabled
Is Archived
Is Fork
Is Locked
Is Mirror
Is Private