deunicode

Convert Unicode to ASCII "Ôóű, ?☣ in 北亰" :arrow_right: "Oou, unicorn face biohazard in Bei Jing"

deunicode

Documentation

The deunicode library transliterates Unicode strings such as "Æneid" into pure
ASCII ones such as "AEneid."

It started as a Rust port of Text::Unidecode Perl module, and was extended to support emoji.

This is a fork of unidecode crate. This fork uses a compact representation of Unicode data to minimize memory overhead and executable size.

Examples

extern crate deunicode;
use deunicode::deunicode;

assert_eq!(deunicode("Æneid"), "AEneid");
assert_eq!(deunicode("étude"), "etude");
assert_eq!(deunicode("北亰"), "Bei Jing");
assert_eq!(deunicode("ᔕᓇᓇ"), "shanana");
assert_eq!(deunicode("げんまい茶"), "genmaiCha");
assert_eq!(deunicode("?☣"), "unicorn biohazard");

Guarantees and Warnings

Here are some guarantees you have when calling deunicode():

  • The String returned will be valid ASCII; the decimal representation of
    every char in the string will be between 0 and 127, inclusive.
  • Every ASCII character (0x00 - 0x7F) is mapped to itself.
  • All Unicode characters will translate to printable ASCII characters
    (\n or characters in the range 0x20 - 0x7E).

There are, however, some things you should keep in mind:

  • As stated, some transliterations do produce \n characters.
  • Some Unicode characters transliterate to an empty string, either on purpose
    or because deunicode does not know about the character.
  • Some Unicode characters are unknown and transliterate to "[?]"
    (or a custom placeholder, or None if you use a chars iterator).
  • Many Unicode characters transliterate to multi-character strings. For
    example, "北" is transliterated as "Bei".
  • Han characters used in multiple languages are mapped to Mandarin,
    and will be mostly illegible to Japanese readers.

Unicode data

For a detailed explanation on the rationale behind the original
dataset, refer to this article written
by Burke in 2001.

主要指標

概覽
名稱與所有者apollographql/apollo-client-devtools
主編程語言TypeScript
編程語言Rust (語言數: 5)
平台
許可證MIT License
所有者活动
創建於2016-12-07 22:52:42
推送於2025-10-25 10:18:35
最后一次提交
發布數165
最新版本名稱apollo-client-devtools@4.21.9 (發布於 2025-10-10 17:05:27)
第一版名稱v1.0.2 (發布於 )
用户参与
星數1.5k
關注者數45
派生數173
提交數1.7k
已啟用問題?
問題數330
打開的問題數41
拉請求數1162
打開的拉請求數14
關閉的拉請求數164
项目设置
已啟用Wiki?
已存檔?
是復刻?
已鎖定?
是鏡像?
是私有?