omniparser

omniparser:用于 CSV、JSON、XML、EDI、文本等的本地 Golang ETL 流式解析器和转换库。『omniparser: a native Golang ETL streaming parser and transform library for CSV, JSON, XML, EDI, text, etc.』

Github星跟蹤圖

omniparser

CI
codecov
Go Report Card
PkgGoDev
Mentioned in Awesome Go

Omniparser is a native Golang ETL parser that ingests input data of various formats (CSV, txt, fixed length/width,
XML, EDI/X12/EDIFACT, JSON
, and custom formats) in streaming fashion and transforms data into desired JSON output
based on a schema written in JSON.

Min Golang Version: 1.16

Licenses and Sponsorship

Omniparser is publicly available under MIT License.
Individual and corporate sponsorships are welcome and gratefully
appreciated, and will be listed in the SPONSORS page.
Company-level sponsors get additional benefits and supports
granted in the COMPANY LICENSE.

Documentation

Docs:

References:

Examples:

In the example folders above you will find pairs of input files and their schema files. Then in the
.snapshots sub directory, you'll find their corresponding output files.

Online Playground (not functioning)

Use The Playground (may need to wait for a few seconds for instance to wake up)
for trying out schemas and inputs, yours or existing samples, to see how ingestion and transform work.

As for now (2023/03/14), all of our previous free docker hosting solutions went away and we haven't found another one yet. For now please clone the repo and use ./cli.sh as described in the Getting Started page.

Why

  • No good ETL transform/parser library exists in Golang.
  • Even looking into Java and other languages, choices aren't many and all have limitations:
    • Smooks is dead, plus its EDI parsing/transform is too heavyweight, needing code-gen.
    • BeanIO can't deal with EDI input.
    • Jolt can't deal with anything other than JSON input.
    • JSONata still only JSON -> JSON transform.
  • Many of the parsers/transforms don't support streaming read, loading entire input into memory - not acceptable in some
    situations.

Requirements

  • Golang 1.16 or later.

Recent Major Feature Additions/Changes

  • 2024/06: v1.0.5 released: upgraded minimum go version to 1.16; enabled full ES6 feature support in javascript custom function.
  • 2022/09: v1.0.4 released: added csv2 file format that supersedes the original csv format with support of hierarchical and nested records.
  • 2022/09: v1.0.3 released: added fixedlength2 file format that supersedes the original fixed-length format with support of hierarchical and nested envelopes.
  • 1.0.0 Released!
  • Added Transform.RawRecord() for caller of omniparser to access the raw ingested record.
  • Deprecated custom_parse in favor of custom_func (custom_parse is still usable for
    back-compatibility, it is just removed from all public docs and samples).
  • Added NonValidatingReader EDI segment reader.
  • Added fixed-length file format support in omniv21 handler.
  • Added EDI file format support in omniv21 handler.
  • Major restructure/refactoring
    • Upgrade omni schema version to omni.2.1 due a number of incompatible schema changes:
      • 'result_type' -> 'type'
      • 'ignore_error_and_return_empty_str -> 'ignore_error'
      • 'keep_leading_trailing_space' -> 'no_trim'
    • Changed how we handle custom functions: previously we always use strings as in param type as well as result param
      type. Not anymore, all types are supported for custom function in and out params.
    • Changed the way we package custom functions for extensions: previously we collected custom functions from all
      extensions and then passed all of them to the extension that is used; this feels weird, now only the custom
      functions included in a particular extension are used in that extension.
    • Deprecated/removed most of the custom functions in favor of using 'javascript'.
    • A number of package renaming.
  • Added CSV file format support in omniv2 handler.
  • Introduced IDR node cache for allocation recycling.
  • Introduced IDR for in-memory data representation.
  • Added trie based high performance times.SmartParse.
  • Command line interface (one-off transform cmd or long-running http server mode).
  • javascript engine integration as a custom_func.
  • JSON stream parser.
  • Extensibility:
    • Ability to provide custom functions.
    • Ability to provide custom schema handler.
    • Ability to customize the built-in omniv2 schema handler's parsing code.
    • Ability to provide a new file format support to built-in omniv2 schema handler.

Footnotes

主要指標

概覽
名稱與所有者jf-tech/omniparser
主編程語言Go
編程語言Go (語言數: 5)
平台
許可證MIT License
所有者活动
創建於2020-08-16 22:22:21
推送於2025-02-21 17:27:14
最后一次提交2025-02-22 06:27:05
發布數15
最新版本名稱v1.0.5 (發布於 )
第一版名稱v0.0.1 (發布於 )
用户参与
星數1.1k
關注者數15
派生數81
提交數132
已啟用問題?
問題數64
打開的問題數0
拉請求數161
打開的拉請求數0
關閉的拉請求數3
项目设置
已啟用Wiki?
已存檔?
是復刻?
已鎖定?
是鏡像?
是私有?