omniparser

omniparser:用于 CSV、JSON、XML、EDI、文本等的本地 Golang ETL 流式解析器和转换库。『omniparser: a native Golang ETL streaming parser and transform library for CSV, JSON, XML, EDI, text, etc.』

Github stars Tracking Chart

omniparser

CI
codecov
Go Report Card
PkgGoDev
Mentioned in Awesome Go

Omniparser is a native Golang ETL parser that ingests input data of various formats (CSV, txt, fixed length/width,
XML, EDI/X12/EDIFACT, JSON
, and custom formats) in streaming fashion and transforms data into desired JSON output
based on a schema written in JSON.

Min Golang Version: 1.16

Licenses and Sponsorship

Omniparser is publicly available under MIT License.
Individual and corporate sponsorships are welcome and gratefully
appreciated, and will be listed in the SPONSORS page.
Company-level sponsors get additional benefits and supports
granted in the COMPANY LICENSE.

Documentation

Docs:

References:

Examples:

In the example folders above you will find pairs of input files and their schema files. Then in the
.snapshots sub directory, you'll find their corresponding output files.

Online Playground (not functioning)

Use The Playground (may need to wait for a few seconds for instance to wake up)
for trying out schemas and inputs, yours or existing samples, to see how ingestion and transform work.

As for now (2023/03/14), all of our previous free docker hosting solutions went away and we haven't found another one yet. For now please clone the repo and use ./cli.sh as described in the Getting Started page.

Why

  • No good ETL transform/parser library exists in Golang.
  • Even looking into Java and other languages, choices aren't many and all have limitations:
    • Smooks is dead, plus its EDI parsing/transform is too heavyweight, needing code-gen.
    • BeanIO can't deal with EDI input.
    • Jolt can't deal with anything other than JSON input.
    • JSONata still only JSON -> JSON transform.
  • Many of the parsers/transforms don't support streaming read, loading entire input into memory - not acceptable in some
    situations.

Requirements

  • Golang 1.16 or later.

Recent Major Feature Additions/Changes

  • 2024/06: v1.0.5 released: upgraded minimum go version to 1.16; enabled full ES6 feature support in javascript custom function.
  • 2022/09: v1.0.4 released: added csv2 file format that supersedes the original csv format with support of hierarchical and nested records.
  • 2022/09: v1.0.3 released: added fixedlength2 file format that supersedes the original fixed-length format with support of hierarchical and nested envelopes.
  • 1.0.0 Released!
  • Added Transform.RawRecord() for caller of omniparser to access the raw ingested record.
  • Deprecated custom_parse in favor of custom_func (custom_parse is still usable for
    back-compatibility, it is just removed from all public docs and samples).
  • Added NonValidatingReader EDI segment reader.
  • Added fixed-length file format support in omniv21 handler.
  • Added EDI file format support in omniv21 handler.
  • Major restructure/refactoring
    • Upgrade omni schema version to omni.2.1 due a number of incompatible schema changes:
      • 'result_type' -> 'type'
      • 'ignore_error_and_return_empty_str -> 'ignore_error'
      • 'keep_leading_trailing_space' -> 'no_trim'
    • Changed how we handle custom functions: previously we always use strings as in param type as well as result param
      type. Not anymore, all types are supported for custom function in and out params.
    • Changed the way we package custom functions for extensions: previously we collected custom functions from all
      extensions and then passed all of them to the extension that is used; this feels weird, now only the custom
      functions included in a particular extension are used in that extension.
    • Deprecated/removed most of the custom functions in favor of using 'javascript'.
    • A number of package renaming.
  • Added CSV file format support in omniv2 handler.
  • Introduced IDR node cache for allocation recycling.
  • Introduced IDR for in-memory data representation.
  • Added trie based high performance times.SmartParse.
  • Command line interface (one-off transform cmd or long-running http server mode).
  • javascript engine integration as a custom_func.
  • JSON stream parser.
  • Extensibility:
    • Ability to provide custom functions.
    • Ability to provide custom schema handler.
    • Ability to customize the built-in omniv2 schema handler's parsing code.
    • Ability to provide a new file format support to built-in omniv2 schema handler.

Footnotes

Main metrics

Overview
Name With Ownerjf-tech/omniparser
Primary LanguageGo
Program languageGo (Language Count: 5)
Platform
License:MIT License
所有者活动
Created At2020-08-16 22:22:21
Pushed At2025-02-21 17:27:14
Last Commit At2025-02-22 06:27:05
Release Count15
Last Release Namev1.0.5 (Posted on )
First Release Namev0.0.1 (Posted on )
用户参与
Stargazers Count1.1k
Watchers Count15
Fork Count81
Commits Count132
Has Issues Enabled
Issues Count64
Issue Open Count0
Pull Requests Count161
Pull Requests Open Count0
Pull Requests Close Count3
项目设置
Has Wiki Enabled
Is Archived
Is Fork
Is Locked
Is Mirror
Is Private