fuzzywuzzy

Fuzzy String Matching in Python

  • Owner: seatgeek/fuzzywuzzy
  • Platform:
  • License:: GNU General Public License v2.0
  • Category::
  • Topic:
  • Like:
    0
      Compare:

Github stars Tracking Chart

.. image:: https://travis-ci.org/seatgeek/fuzzywuzzy.svg?branch=master
:target: https://travis-ci.org/seatgeek/fuzzywuzzy

FuzzyWuzzy

Fuzzy string matching like a boss. It uses Levenshtein Distance <https://en.wikipedia.org/wiki/Levenshtein_distance>_ to calculate the differences between sequences in a simple-to-use package.

Requirements

  • Python 2.7 or higher
  • difflib
  • python-Levenshtein <https://github.com/ztane/python-Levenshtein/>_ (optional, provides a 4-10x speedup in String
    Matching, though may result in differing results for certain cases <https://github.com/seatgeek/fuzzywuzzy/issues/128>_)

For testing

-  pycodestyle
-  hypothesis
-  pytest

Installation
============

Using PIP via PyPI

.. code:: bash

    pip install fuzzywuzzy

or the following to install `python-Levenshtein` too

.. code:: bash

    pip install fuzzywuzzy[speedup]


Using PIP via Github

.. code:: bash

    pip install git+git://github.com/seatgeek/fuzzywuzzy.git@0.18.0#egg=fuzzywuzzy

Adding to your ``requirements.txt`` file (run ``pip install -r requirements.txt`` afterwards)

.. code:: bash

    git+ssh://git@github.com/seatgeek/fuzzywuzzy.git@0.18.0#egg=fuzzywuzzy
    
Manually via GIT

.. code:: bash

    git clone git://github.com/seatgeek/fuzzywuzzy.git fuzzywuzzy
    cd fuzzywuzzy
    python setup.py install


Usage
=====

.. code:: python

    >>> from fuzzywuzzy import fuzz
    >>> from fuzzywuzzy import process

Simple Ratio

.. code:: python

>>> fuzz.ratio("this is a test", "this is a test!")
    97

Partial Ratio


.. code:: python

    >>> fuzz.partial_ratio("this is a test", "this is a test!")
        100

Token Sort Ratio

.. code:: python

>>> fuzz.ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
    91
>>> fuzz.token_sort_ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
    100

Token Set Ratio


.. code:: python

    >>> fuzz.token_sort_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear")
        84
    >>> fuzz.token_set_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear")
        100

Process
~~~~~~~

.. code:: python

    >>> choices = ["Atlanta Falcons", "New York Jets", "New York Giants", "Dallas Cowboys"]
    >>> process.extract("new york jets", choices, limit=2)
        [('New York Jets', 100), ('New York Giants', 78)]
    >>> process.extractOne("cowboys", choices)
        ("Dallas Cowboys", 90)

You can also pass additional parameters to ``extractOne`` method to make it use a specific scorer. A typical use case is to match file paths:

.. code:: python
  
    >>> process.extractOne("System of a down - Hypnotize - Heroin", songs)
        ('/music/library/good/System of a Down/2005 - Hypnotize/01 - Attack.mp3', 86)
    >>> process.extractOne("System of a down - Hypnotize - Heroin", songs, scorer=fuzz.token_sort_ratio)
        ("/music/library/good/System of a Down/2005 - Hypnotize/10 - She's Like Heroin.mp3", 61)

.., Build Status, image:: https://api.travis-ci.org/seatgeek/fuzzywuzzy.png?branch=master
   :target: https:travis-ci.org/seatgeek/fuzzywuzzy

Known Ports
============

FuzzyWuzzy is being ported to other languages too! Here are a few ports we know about:

-  Java: `xpresso's fuzzywuzzy implementation <https://github.com/WantedTechnologies/xpresso/wiki/Approximate-string-comparison-and-pattern-matching-in-Java>`_
-  Java: `fuzzywuzzy (java port) <https://github.com/xdrop/fuzzywuzzy>`_
-  Rust: `fuzzyrusty (Rust port) <https://github.com/logannc/fuzzyrusty>`_
-  JavaScript: `fuzzball.js (JavaScript port) <https://github.com/nol13/fuzzball.js>`_
-  C++: `Tmplt/fuzzywuzzy <https://github.com/Tmplt/fuzzywuzzy>`_
-  C#: `fuzzysharp (.Net port) <https://github.com/BoomTownRoi/BoomTown.FuzzySharp>`_
-  Go: `go-fuzzywuzz (Go port) <https://github.com/paul-mannino/go-fuzzywuzzy>`_
-  Free Pascal: `FuzzyWuzzy.pas (Free Pascal port) <https://github.com/DavidMoraisFerreira/FuzzyWuzzy.pas>`_
-  Kotlin multiplatform: `FuzzyWuzzy-Kotlin <https://github.com/willowtreeapps/fuzzywuzzy-kotlin>`_
-  R: `fuzzywuzzyR (R port) <https://github.com/mlampros/fuzzywuzzyR>`_

Main metrics

Overview
Name With Ownerseatgeek/fuzzywuzzy
Primary LanguagePython
Program languagePython (Language Count: 2)
Platform
License:GNU General Public License v2.0
所有者活动
Created At2011-07-08 19:32:34
Pushed At2023-02-24 19:00:26
Last Commit At2021-09-09 13:54:41
Release Count24
Last Release Name0.18.0 (Posted on 2020-02-13 16:01:26)
First Release Name0.3.0 (Posted on )
用户参与
Stargazers Count9.3k
Watchers Count262
Fork Count876
Commits Count384
Has Issues Enabled
Issues Count187
Issue Open Count83
Pull Requests Count90
Pull Requests Open Count23
Pull Requests Close Count35
项目设置
Has Wiki Enabled
Is Archived
Is Fork
Is Locked
Is Mirror
Is Private