Duckling

用于表达、测试和评估输入字符串上的可组合语言规则的语言、引擎和工具。「Language, engine, and tooling for expressing, testing, and evaluating composable language rules on input strings.」

Github stars Tracking Chart

Duckling

Duckling 是一个将文本解析成结构化数据的 Haskell 库。

"the first Tuesday of October"
=> {"value":"2017-10-03T00:00:00.000-07:00","grain":"day"}

要求

需要一个 Haskell 环境。我们推荐使用 stack

在 macOS 上,你需要安装 PCRE 开发头文件。最简单的方法是使用 Homebrew

brew install pcre

如果这样还不行,请尝试运行 brew doctor 并解决它发现的问题。

快速启动

编译并运行二进制文件:

$ stack build
$ stack exec duckling-example-exe

第一次运行它时,它会下载所有需要的包。

这将运行一个基本的 HTTP 服务器。示例请求:

$ curl -XPOST http://0.0.0.0:8000/parse --data 'locale=en_GB&text=tomorrow at eight'

在示例应用程序中,默认情况下启用所有维度。提供参数 dims 来指定你想要的维度。例子:

只识别信用卡号码:
$ curl -XPOST http://0.0.0.0:8000/parse --data 'locale=en_US&text="4111-1111-1111-1111"&dims="[\"credit-card-number\"]"'

如果你想要多个维度,可以在数组中用逗号分隔:

$ curl -XPOST http://0.0.0.0:8000/parse --data 'locale=en_US&text="3 cups of sugar"&dims="[\"quantity\",\"numeral\"]"'

参见 exe/ExampleMain.hs,了解如何在你的项目中集成 Duckling。如果你的后台没有运行 Haskell,或者你不想旋转自己的Duckling服务器,你可以直接使用 wit.ai 的内置实体。

支持的维度

Duckling 支持许多语言,但大多数语言还不支持所有的维度(我们需要您的帮助!)。请在 此目录 中寻找特定语言的支持。

Dimension Example input Example value output
AmountOfMoney "42€" {"value":42,"type":"value","unit":"EUR"}
CreditCardNumber "4111-1111-1111-1111" {"value":"4111111111111111","issuer":"visa"}
Distance "6 miles" {"value":6,"type":"value","unit":"mile"}
Duration "3 mins" {"value":3,"minute":3,"unit":"minute","normalized":{"value":180,"unit":"second"}}
Email "duckling-team@fb.com" {"value":"duckling-team@fb.com"}
Numeral "eighty eight" {"value":88,"type":"value"}
Ordinal "33rd" {"value":33,"type":"value"}
PhoneNumber "+1 (650) 123-4567" {"value":"(+1) 6501234567"}
Quantity "3 cups of sugar" {"value":3,"type":"value","product":"sugar","unit":"cup"}
Temperature "80F" {"value":80,"type":"value","unit":"fahrenheit"}
Time "today at 9am" {"values":[{"value":"2016-12-14T09:00:00.000-08:00","grain":"hour","type":"value"}],"value":"2016-12-14T09:00:00.000-08:00","grain":"hour","type":"value"}
Url "https://api.wit.ai/message?q=hi" {"value":"https://api.wit.ai/message?q=hi","domain":"api.wit.ai"}
Volume "4 gallons" {"value":4,"type":"value","unit":"gallon"}

也支持 自定义维度

扩展 Duckling

重新生成分类器并运行测试套件。

$ stack build :duckling-regen-exe && stack exec duckling-regen-exe && stack test

在更新代码后和运行测试套件之前,重新生成分类器是很重要的。

要扩展 Duckling 对给定语言的维度的支持,通常需要更新4个文件:

  • Duckling/<Dimension>/<Lang>/Rules.hs
  • Duckling/<Dimension>/<Lang>/Corpus.hs
  • Duckling/Dimensions/<Lang>.hs (如果 Duckling/Dimensions/Common.hs 中没有的话)
  • Duckling/Rules/<Lang>.hs

要添加一个新的语言:

要添加一个新的 locale:

规则有名称、模式和产品。模式用于执行字符级匹配(输入上的正则表达式)和概念级匹配(标记上的谓词)。结果是任意函数,它接受一个令牌列表并返回一个新的令牌。

语料库(resp. negative corpus)是一个应该(resp. shouldn't)解析的例子列表。语料库的参考时间是2013年2月12日星期二凌晨4:30。

Duckling.Debug提供了一些调试工具:

$ stack repl --no-load
> :l Duckling.Debug
> debug (makeLocale EN $ Just US) "in two minutes" [This Time]
in|within|after <duration> (in two minutes)
-- regex (in)
-- <integer> <unit-of-duration> (two minutes)
-- -- integer (0..19) (two)
-- -- -- regex (two)
-- -- minute (grain) (minutes)
-- -- -- regex (minutes)
[Entity {dim = "time", body = "in two minutes", value = RVal Time (TimeValue (SimpleValue (InstantValue {vValue = 2013-02-12 04:32:00 -0200, vGrain = Second})) [SimpleValue (InstantValue {vValue = 2013-02-12 04:32:00 -0200, vGrain = Second})] Nothing), start = 0, end = 14}]

许可证

Duckling is BSD-licensed.


Overview

Name With Ownerfacebook/duckling
Primary LanguageHaskell
Program languageHaskell (Language Count: 2)
PlatformLinux, Mac, Windows
License:Other
Release Count9
Last Release Namev0.2.0.0 (Posted on )
First Release Namev0.1.0.0 (Posted on )
Created At2017-03-02 01:45:50
Pushed At2024-02-16 17:56:19
Last Commit At
Stargazers Count4k
Watchers Count81
Fork Count718
Commits Count754
Has Issues Enabled
Issues Count401
Issue Open Count117
Pull Requests Count1
Pull Requests Open Count15
Pull Requests Close Count317
Has Wiki Enabled
Is Archived
Is Fork
Is Locked
Is Mirror
Is Private

Duckling Logo

Duckling Build Status

Duckling is a Haskell library that parses text into structured data.

"the first Tuesday of October"
=> {"value":"2017-10-03T00:00:00.000-07:00","grain":"day"}

Requirements

A Haskell environment is required. We recommend using
stack.

On macOS you'll need to install PCRE development headers.
The easiest way to do that is with Homebrew:

brew install pcre

If that doesn't help, try running brew doctor and fix
the issues it finds.

Quickstart

To compile and run the binary:

$ stack build
$ stack exec duckling-example-exe

The first time you run it, it will download all required packages.

This runs a basic HTTP server. Example request:

$ curl -XPOST http://0.0.0.0:8000/parse --data 'locale=en_GB&text=tomorrow at eight'

See exe/ExampleMain.hs for an example on how to integrate Duckling in your
project.
If your backend doesn't run Haskell or if you don't want to spin your own Duckling server, you can directly use wit.ai's built-in entities.

Supported dimensions

Duckling supports many languages, but most don't support all dimensions yet
(we need your help!).
Please look into this directory for language-specific support.

To the top