regexgen

Generate regular expressions that match a set of strings

  • 所有者: devongovett/regexgen
  • 平台:
  • 许可证:
  • 分类:
  • 主题:
  • 喜欢:
    0
      比较:

Github星跟踪图

regexgen

Generates regular expressions that match a set of strings.

Installation

regexgen can be installed using npm:

npm install regexgen

Example

The simplest use is to simply pass an array of strings to regexgen:

const regexgen = require('regexgen');

regexgen(['foobar', 'foobaz', 'foozap', 'fooza']); // => /foo(?:zap?, ba[rz])/

You can also use the Trie class directly:

const {Trie} = require('regexgen');

let t = new Trie;
t.add('foobar');
t.add('foobaz');

t.toRegExp(); // => /fooba[rz]/

CLI

regexgen also has a simple CLI to generate regexes using inputs from the command line.

$ regexgen
Usage: regexgen [-gimuy] string1 string2 string3...

The optional first parameter is the flags to add
to the regex (e.g. -i for a case insensitive match).

ES2015 and Unicode

By default regexgen will output a standard JavaScript regular expression, with Unicode codepoints converted into UCS-2 surrogate pairs.

If desired, you can request an ES2015-compatible Unicode regular expression by supplying the -u flag, which results in those codepoints being retained.

$ regexgen ? ?‍? ??‍? ??‍? ??‍? ??‍? ??‍?
/\uD83D\uDC69(?:(?:\uD83C[\uDFFB-\uDFFF])?\u200D\uD83D\uDCBB)?/

$ regexgen -u ? ?‍? ??‍? ??‍? ??‍? ??‍? ??‍?
/\u{1F469}(?:[\u{1F3FB}-\u{1F3FF}]?\u200D\u{1F4BB})?/u

Such regular expressions are compatible with current versions of Node, as well as the latest browsers, and may be more transferrable to other languages.

How does it work?

  1. Generate a Trie containing all of the input strings.
    This is a tree structure where each edge represents a single character. This removes
    redundancies at the start of the strings, but common branches further down are not merged.

  2. A trie can be seen as a tree-shaped deterministic finite automaton (DFA), so DFA algorithms
    can be applied. In this case, we apply Hopcroft's DFA minimization algorithm
    to merge the nondistinguishable states.

  3. Convert the resulting minimized DFA to a regular expression. This is done using
    Brzozowski's algebraic method,
    which is quite elegant. It expresses the DFA as a system of equations which can be solved
    for a resulting regex. Along the way, some additional optimizations are made, such
    as hoisting common substrings out of an alternation, and using character class ranges.
    This produces an an Abstract Syntax Tree
    (AST) for the regex, which is then converted to a string and compiled to a JavaScript
    RegExp object.

License

MIT

主要指标

概览
名称与所有者devongovett/regexgen
主编程语言JavaScript
编程语言JavaScript (语言数: 1)
平台
许可证
所有者活动
创建于2016-12-20 02:50:53
推送于2024-02-15 03:01:52
最后一次提交2017-07-08 21:34:43
发布数7
最新版本名称v1.3.0 (发布于 2017-07-08 21:34:43)
第一版名称v1.1.0 (发布于 2016-12-21 20:31:15)
用户参与
星数3.4k
关注者数53
派生数100
提交数40
已启用问题?
问题数22
打开的问题数12
拉请求数6
打开的拉请求数3
关闭的拉请求数3
项目设置
已启用Wiki?
已存档?
是复刻?
已锁定?
是镜像?
是私有?