Levenshtein-MySQL-UDF

General Levenshtein algorithm and k-bounded Levenshtein distance in linear time and constant space. Implementation in C as UDFs for MySQL? and MariaDB ᶘ ᵒᴥᵒᶅ

  • Owner: juanmirocks/Levenshtein-MySQL-UDF
  • Platform:
  • License:: GNU Lesser General Public License v3.0
  • Category::
  • Topic:
  • Like:
    0
      Compare:

Github stars Tracking Chart

MySQL/MariaDB UDF functions implemented in C

Features:

  • General Levenshtein algorithm
  • k-bounded Levenshtein distance algorithm (linear time, constant space).
    • Info: this is when you only care about the distance if it's smaller or equal than your given k (e.g. to test if the spelling difference between two words is of maximum 1). In this case, the algorithm runs faster while using less memory.
  • Levenshtein ratio
    • Info: this is syntactic sugar for levenshtein_ratio(s, t) = 1 - levenshtein(s, t) / max(s.length, t.length))
  • k-bounded Levenshtein ratio

Install

First you need to compile the library and tell MySQL/MariaDB about it:

(Unix) gcc -o levenshtein.so -shared levenshtein.c `mysql_config --include`  # On Linux x64 you may need to add the -fPIC flag
(macOS) gcc -bundle -o levenshtein.so levenshtein.c `mysql_config --include`
cp levenshtein.so `mysql_config --plugindir` # You may need sudo

Then in a console, run:

CREATE FUNCTION levenshtein RETURNS INT SONAME 'levenshtein.so';
CREATE FUNCTION levenshtein_k RETURNS INT SONAME 'levenshtein.so';
CREATE FUNCTION levenshtein_ratio RETURNS REAL SONAME 'levenshtein.so';
CREATE FUNCTION levenshtein_k_ratio RETURNS REAL SONAME 'levenshtein.so';

That should be all ? ᶘ ᵒᴥᵒᶅ !

Note

Just in case the last SQL statements failed, consider that to create and use UDF functions, you need CREATE ROUTINE, EXECUTE privileges.

For MariaDB you have to grant additional privileges on mysql database. Here is an example of privileges allowing dropping, altering, creating and using functions:

GRANT INSERT, DELETE, DROP ROUTINE, CREATE ROUTINE, ALTER ROUTINE, EXECUTE ON mysql.* TO 'user'@'%';

For more details on setting up UDFs in MySQL see: UDF Compiling and Installing

Run

For example:

select 3 = levenshtein('maneuver', 'manoeuvre');
select 3 = levenshtein_k('maneuver', 'manoeuvre', 5);  -- k=5, allow the distance to be up to 5, otherwise it's 6 or greater
select 2 = levenshtein_k('maneuver', 'manoeuvre', 1);  -- k=1, allow the distance to be up to 1, otherwise it's 2 or greater

select 0.6666666666666667 = levenshtein_ratio('maneuver', 'manoeuvre'); -- that is, 1 - 3/9
select 0.6666666666666667 = levenshtein_k_ratio('maneuver', 'manoeuvre', 5); -- same
select 0 = levenshtein_k_ratio('maneuver', 'manoeuvre', 1); -- 0 because the distance (3) is greater than the maximum k allowed (1)

Test (for Development)

Your contributions are very much welcome!

Please before making a pull request, run the unit tests file.

This should return 1 at the end:

mysql -uroot < unittest.sql  # Change your username & password as needed

Main metrics

Overview
Name With Ownerjuanmirocks/Levenshtein-MySQL-UDF
Primary LanguageC
Program languageC (Language Count: 1)
Platform
License:GNU Lesser General Public License v3.0
所有者活动
Created At2011-11-11 17:40:32
Pushed At2022-03-23 12:10:27
Last Commit At2022-03-23 13:10:27
Release Count4
Last Release Name2.0.2 (Posted on 2017-08-27 11:59:18)
First Release Name1.0.0 (Posted on 2017-08-20 10:44:43)
用户参与
Stargazers Count118
Watchers Count9
Fork Count36
Commits Count58
Has Issues Enabled
Issues Count14
Issue Open Count5
Pull Requests Count3
Pull Requests Open Count1
Pull Requests Close Count2
项目设置
Has Wiki Enabled
Is Archived
Is Fork
Is Locked
Is Mirror
Is Private