fuzzywuzzy

Fuzzy String Matching in Python

  • 所有者: seatgeek/fuzzywuzzy
  • 平台:
  • 許可證: GNU General Public License v2.0
  • 分類:
  • 主題:
  • 喜歡:
    0
      比較:

Github星跟蹤圖

.. image:: https://travis-ci.org/seatgeek/fuzzywuzzy.svg?branch=master
:target: https://travis-ci.org/seatgeek/fuzzywuzzy

FuzzyWuzzy

Fuzzy string matching like a boss. It uses Levenshtein Distance <https://en.wikipedia.org/wiki/Levenshtein_distance>_ to calculate the differences between sequences in a simple-to-use package.

Requirements

  • Python 2.7 or higher
  • difflib
  • python-Levenshtein <https://github.com/ztane/python-Levenshtein/>_ (optional, provides a 4-10x speedup in String
    Matching, though may result in differing results for certain cases <https://github.com/seatgeek/fuzzywuzzy/issues/128>_)

For testing

-  pycodestyle
-  hypothesis
-  pytest

Installation
============

Using PIP via PyPI

.. code:: bash

    pip install fuzzywuzzy

or the following to install `python-Levenshtein` too

.. code:: bash

    pip install fuzzywuzzy[speedup]


Using PIP via Github

.. code:: bash

    pip install git+git://github.com/seatgeek/fuzzywuzzy.git@0.18.0#egg=fuzzywuzzy

Adding to your ``requirements.txt`` file (run ``pip install -r requirements.txt`` afterwards)

.. code:: bash

    git+ssh://git@github.com/seatgeek/fuzzywuzzy.git@0.18.0#egg=fuzzywuzzy
    
Manually via GIT

.. code:: bash

    git clone git://github.com/seatgeek/fuzzywuzzy.git fuzzywuzzy
    cd fuzzywuzzy
    python setup.py install


Usage
=====

.. code:: python

    >>> from fuzzywuzzy import fuzz
    >>> from fuzzywuzzy import process

Simple Ratio

.. code:: python

>>> fuzz.ratio("this is a test", "this is a test!")
    97

Partial Ratio


.. code:: python

    >>> fuzz.partial_ratio("this is a test", "this is a test!")
        100

Token Sort Ratio

.. code:: python

>>> fuzz.ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
    91
>>> fuzz.token_sort_ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
    100

Token Set Ratio


.. code:: python

    >>> fuzz.token_sort_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear")
        84
    >>> fuzz.token_set_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear")
        100

Process
~~~~~~~

.. code:: python

    >>> choices = ["Atlanta Falcons", "New York Jets", "New York Giants", "Dallas Cowboys"]
    >>> process.extract("new york jets", choices, limit=2)
        [('New York Jets', 100), ('New York Giants', 78)]
    >>> process.extractOne("cowboys", choices)
        ("Dallas Cowboys", 90)

You can also pass additional parameters to ``extractOne`` method to make it use a specific scorer. A typical use case is to match file paths:

.. code:: python
  
    >>> process.extractOne("System of a down - Hypnotize - Heroin", songs)
        ('/music/library/good/System of a Down/2005 - Hypnotize/01 - Attack.mp3', 86)
    >>> process.extractOne("System of a down - Hypnotize - Heroin", songs, scorer=fuzz.token_sort_ratio)
        ("/music/library/good/System of a Down/2005 - Hypnotize/10 - She's Like Heroin.mp3", 61)

.., Build Status, image:: https://api.travis-ci.org/seatgeek/fuzzywuzzy.png?branch=master
   :target: https:travis-ci.org/seatgeek/fuzzywuzzy

Known Ports
============

FuzzyWuzzy is being ported to other languages too! Here are a few ports we know about:

-  Java: `xpresso's fuzzywuzzy implementation <https://github.com/WantedTechnologies/xpresso/wiki/Approximate-string-comparison-and-pattern-matching-in-Java>`_
-  Java: `fuzzywuzzy (java port) <https://github.com/xdrop/fuzzywuzzy>`_
-  Rust: `fuzzyrusty (Rust port) <https://github.com/logannc/fuzzyrusty>`_
-  JavaScript: `fuzzball.js (JavaScript port) <https://github.com/nol13/fuzzball.js>`_
-  C++: `Tmplt/fuzzywuzzy <https://github.com/Tmplt/fuzzywuzzy>`_
-  C#: `fuzzysharp (.Net port) <https://github.com/BoomTownRoi/BoomTown.FuzzySharp>`_
-  Go: `go-fuzzywuzz (Go port) <https://github.com/paul-mannino/go-fuzzywuzzy>`_
-  Free Pascal: `FuzzyWuzzy.pas (Free Pascal port) <https://github.com/DavidMoraisFerreira/FuzzyWuzzy.pas>`_
-  Kotlin multiplatform: `FuzzyWuzzy-Kotlin <https://github.com/willowtreeapps/fuzzywuzzy-kotlin>`_
-  R: `fuzzywuzzyR (R port) <https://github.com/mlampros/fuzzywuzzyR>`_

主要指標

概覽
名稱與所有者seatgeek/fuzzywuzzy
主編程語言Python
編程語言Python (語言數: 2)
平台
許可證GNU General Public License v2.0
所有者活动
創建於2011-07-08 19:32:34
推送於2023-02-24 19:00:26
最后一次提交2021-09-09 13:54:41
發布數24
最新版本名稱0.18.0 (發布於 2020-02-13 16:01:26)
第一版名稱0.3.0 (發布於 )
用户参与
星數9.3k
關注者數262
派生數876
提交數384
已啟用問題?
問題數187
打開的問題數83
拉請求數90
打開的拉請求數23
關閉的拉請求數35
项目设置
已啟用Wiki?
已存檔?
是復刻?
已鎖定?
是鏡像?
是私有?