deunicode

Convert Unicode to ASCII "Ôóű, ?☣ in 北亰" :arrow_right: "Oou, unicorn face biohazard in Bei Jing"

Github stars Tracking Chart

deunicode

Documentation

The deunicode library transliterates Unicode strings such as "Æneid" into pure
ASCII ones such as "AEneid."

It started as a Rust port of Text::Unidecode Perl module, and was extended to support emoji.

This is a fork of unidecode crate. This fork uses a compact representation of Unicode data to minimize memory overhead and executable size.

Examples

extern crate deunicode;
use deunicode::deunicode;

assert_eq!(deunicode("Æneid"), "AEneid");
assert_eq!(deunicode("étude"), "etude");
assert_eq!(deunicode("北亰"), "Bei Jing");
assert_eq!(deunicode("ᔕᓇᓇ"), "shanana");
assert_eq!(deunicode("げんまい茶"), "genmaiCha");
assert_eq!(deunicode("?☣"), "unicorn biohazard");

Guarantees and Warnings

Here are some guarantees you have when calling deunicode():

  • The String returned will be valid ASCII; the decimal representation of
    every char in the string will be between 0 and 127, inclusive.
  • Every ASCII character (0x00 - 0x7F) is mapped to itself.
  • All Unicode characters will translate to printable ASCII characters
    (\n or characters in the range 0x20 - 0x7E).

There are, however, some things you should keep in mind:

  • As stated, some transliterations do produce \n characters.
  • Some Unicode characters transliterate to an empty string, either on purpose
    or because deunicode does not know about the character.
  • Some Unicode characters are unknown and transliterate to "[?]"
    (or a custom placeholder, or None if you use a chars iterator).
  • Many Unicode characters transliterate to multi-character strings. For
    example, "北" is transliterated as "Bei".
  • Han characters used in multiple languages are mapped to Mandarin,
    and will be mostly illegible to Japanese readers.

Unicode data

For a detailed explanation on the rationale behind the original
dataset, refer to this article written
by Burke in 2001.

Main metrics

Overview
Name With Ownerkornelski/deunicode
Primary LanguageRust
Program languageRust (Language Count: 1)
Platform
License:Other
所有者活动
Created At2018-05-05 11:28:12
Pushed At2025-04-27 01:56:07
Last Commit At
Release Count29
Last Release Namev1.6.2 (Posted on 2025-04-27 02:56:06)
First Release Namev0.1.0 (Posted on )
用户参与
Stargazers Count82
Watchers Count5
Fork Count3
Commits Count116
Has Issues Enabled
Issues Count11
Issue Open Count2
Pull Requests Count4
Pull Requests Open Count0
Pull Requests Close Count1
项目设置
Has Wiki Enabled
Is Archived
Is Fork
Is Locked
Is Mirror
Is Private