zek

从 XML 生成 Go 结构体。「Generate a Go struct from XML.」

Github星跟踪图

zek

Zek is a prototype for creating a Go
struct from an XML document. The
resulting struct works best for reading XML (see also
#14), to create XML, you might want to
use something else.

It was developed at Leipzig University Library to shorten the
time to go from raw XML to a struct that allows to access XML data in Go
programs.

Skip the fluff, just the code.

Given some XML, run:

$ curl -s https://raw.githubusercontent.com/miku/zek/master/fixtures/e.xml | zek -e
// Rss was generated 2018-08-30 20:24:14 by tir on sol.
type Rss struct {
    XMLName xml.Name `xml:"rss"`
    Text    string   `xml:",chardata"`
    Rdf     string   `xml:"rdf,attr"`
    Dc      string   `xml:"dc,attr"`
    Geoscan string   `xml:"geoscan,attr"`
    Media   string   `xml:"media,attr"`
    Gml     string   `xml:"gml,attr"`
    Taxo    string   `xml:"taxo,attr"`
    Georss  string   `xml:"georss,attr"`
    Content string   `xml:"content,attr"`
    Geo     string   `xml:"geo,attr"`
    Version string   `xml:"version,attr"`
    Channel struct {
        Text          string `xml:",chardata"`
        Title         string `xml:"title"`         // ESS New Releases (Display...
        Link          string `xml:"link"`          // http://tinyurl.com/ESSNew...
        Description   string `xml:"description"`   // New releases from the Ear...
        LastBuildDate string `xml:"lastBuildDate"` // Mon, 27 Nov 2017 00:06:35...
        Item          []struct {
            Text        string `xml:",chardata"`
            Title       string `xml:"title"`       // Surficial geology, Aberde...
            Link        string `xml:"link"`        // https://geoscan.nrcan.gc....
            Description string `xml:"description"` // Geological Survey of Cana...
            Guid        struct {
                Text        string `xml:",chardata"` // 304279, 306212, 306175, 3...
                IsPermaLink string `xml:"isPermaLink,attr"`
            } `xml:"guid"`
            PubDate       string   `xml:"pubDate"`      // Fri, 24 Nov 2017 00:00:00...
            Polygon       []string `xml:"polygon"`      // 64.0000 -98.0000 64.0000 ...
            Download      string   `xml:"download"`     // https://geoscan.nrcan.gc....
            License       string   `xml:"license"`      // http://data.gc.ca/eng/ope...
            Author        string   `xml:"author"`       // Geological Survey of Cana...
            Source        string   `xml:"source"`       // Geological Survey of Cana...
            SndSeries     string   `xml:"SndSeries"`    // Bedford Institute of Ocea...
            Publisher     string   `xml:"publisher"`    // Natural Resources Canada,...
            Edition       string   `xml:"edition"`      // prelim., surficial data m...
            Meeting       string   `xml:"meeting"`      // Geological Association of...
            Documenttype  string   `xml:"documenttype"` // serial, open file, serial...
            Language      string   `xml:"language"`     // English, English, English...
            Maps          string   `xml:"maps"`         // 1 map, 5 maps, Publicatio...
            Mapinfo       string   `xml:"mapinfo"`      // surficial geology, surfic...
            Medium        string   `xml:"medium"`       // on-line; digital, digital...
            Province      string   `xml:"province"`     // Nunavut, Northwest Territ...
            Nts           string   `xml:"nts"`          // 066B, 095J; 095N; 095O; 0...
            Area          string   `xml:"area"`         // Aberdeen Lake, Mackenzie ...
            Subjects      string   `xml:"subjects"`
            Program       string   `xml:"program"`       // GEM2: Geo-mapping for Ene...
            Project       string   `xml:"project"`       // Rae Province Project Mana...
            Projectnumber string   `xml:"projectnumber"` // 340521, 343202, 340557, 3...
            Abstract      string   `xml:"abstract"`      // This new surficial geolog...
            Links         string   `xml:"links"`         // Online - En ligne (PDF, 9...
            Readme        string   `xml:"readme"`        // readme | https://geoscan....
            PPIid         string   `xml:"PPIid"`         // 34532, 35096, 35438, 2563...
        } `xml:"item"`
    } `xml:"channel"`
}

Online

About

Project Status: Active – The project has reached a stable, usable state and is being actively developed.

Upsides:

  • it works fine for non-recursive structures,
  • does not need XSD or DTD,
  • it is relatively convenient to access attributes, children and text,
  • will generate a single struct, which make for a quite compact representation,
  • simple user interface,
  • comments with examples,
  • schema inference across multiple files.

Downsides:

  • experimental, early, buggy, unstable prototype,
  • no support for recursive types (similar to Russian Doll strategy, [1])
  • no type inference, everything is accessible as string (without a schema, type inference may fail if the type guess is wrong)

Bugs:

Mapping between XML elements and data structures is inherently flawed: an XML
element is an order-dependent collection of anonymous values, while a data
structure is an order-independent collection of named values.

https://golang.org/pkg/encoding/xml/#pkg-note-BUG

Related projects:

And other awesome XML utilities.

Presentations:

Install

$ go install github.com/miku/zek/cmd/zek@latest

Debian and RPM packages:

It's in AUR, too.

Usage

$ zek -h
  -B    use a fixed banner string (e.g. for CI)
  -C    emit less compact struct
  -F    skip formatting
  -I    use verbatim innerxml instead of chardata
  -P string
        if set, write out struct within a package with the given name
  -S int
        read at most this many tags, approximately (0=unlimited)
  -c    emit more compact struct (noop, as this is the default since 0.1.7)
  -d    debug output
  -e    add comments with example
  -j    add JSON tags
  -m    omit empty Text fields
  -max-examples int
        limit number of examples (default 10)
  -n string
        use a different name for the top-level struct
  -o string
        if set, write to output file, not stdout
  -p    write out an example program
  -s    strict parsing and writing
  -t string
        emit struct for tag matching this name
  -u    filter out duplicated examples
  -version
        show version
  -x int
        max chars for example (default 25)

Examples:

$ cat fixtures/a.xml
<a></a>

$ zek -C < fixtures/a.xml
type A struct {
    XMLName xml.Name `xml:"a"`
    Text    string   `xml:",chardata"`
}

Debug output dumps the internal tree as JSON to stdout.

$ zek -d < fixtures/a.xml
{"name":{"Space":"","Local":"a"}}

Example program:

package main

import (
	"encoding/json"
	"encoding/xml"
	"fmt"
	"log"
	"os"
)

// A was generated 2017-12-05 17:35:21 by tir on apollo.
type A struct {
	XMLName xml.Name `xml:"a"`
	Text    string   `xml:",chardata"`
}

func main() {
	dec := xml.NewDecoder(os.Stdin)
	var doc A
	if err := dec.Decode(&doc); err != nil {
		log.Fatal(err)
	}
	b, err := json.Marshal(doc)
	if err != nil {
		log.Fatal(err)
	}
	fmt.Println(string(b))
}

$ zek -C -p < fixtures/a.xml > sample.go && go run sample.go < fixtures/a.xml | jq . && rm sample.go
{
  "XMLName": {
    "Space": "",
    "Local": "a"
  },
  "Text": ""
}

More complex example:

$ zek < fixtures/d.xml
// Root was generated 2019-06-11 16:27:04 by tir on hayiti.
type Root struct {
        XMLName xml.Name `xml:"root"`
        Text    string   `xml:",chardata"`
        A       []struct {
                Text string `xml:",chardata"`
                B    []struct {
                        Text string `xml:",chardata"`
                        C    string `xml:"c"`
                        D    string `xml:"d"`
                } `xml:"b"`
        } `xml:"a"`
}

$ zek -p < fixtures/d.xml > sample.go && go run sample.go < fixtures/d.xml | jq . && rm sample.go
{
  "XMLName": {
    "Space": "",
    "Local": "root"
  },
  "Text": "\n\n\n\n",
  "A": [
    {
      "Text": "\n  \n  \n",
      "B": [
        {
          "Text": "\n    \n  ",
          "C": "Hi",
          "D": ""
        },
        {
          "Text": "\n    \n    \n  ",
          "C": "World",
          "D": ""
        }
      ]
    },
    {
      "Text": "\n  \n",
      "B": [
        {
          "Text": "\n    \n  ",
          "C": "Hello",
          "D": ""
        }
      ]
    },
    {
      "Text": "\n  \n",
      "B": [
        {
          "Text": "\n    \n  ",
          "C": "",
          "D": "World"
        }
      ]
    }
  ]
}

Annotate with comments:

$ zek -e < fixtures/l.xml
// Records was generated 2019-06-11 16:29:35 by tir on hayiti.
type Records struct {
        XMLName xml.Name `xml:"Records"`
        Text    string   `xml:",chardata"` // \n
        Xsi     string   `xml:"xsi,attr"`
        Record  []struct {
                Text   string `xml:",chardata"`
                Header struct {
                        Text       string `xml:",chardata"`
                        Status     string `xml:"status,attr"`
                        Identifier string `xml:"identifier"` // oai:ojs.localhost:article...
                        Datestamp  string `xml:"datestamp"`  // 2009-06-24T14:48:23Z, 200...
                        SetSpec    string `xml:"setSpec"`    // eppp:ART, eppp:ART, eppp:...
                } `xml:"header"`
                Metadata struct {
                        Text    string `xml:",chardata"`
                        Rfc1807 struct {
                                Text           string   `xml:",chardata"`
                                Xmlns          string   `xml:"xmlns,attr"`
                                Xsi            string   `xml:"xsi,attr"`
                                SchemaLocation string   `xml:"schemaLocation,attr"`
                                BibVersion     string   `xml:"bib-version"`  // v2, v2, v2...
                                ID             string   `xml:"id"`           // http://jou...
                                Entry          string   `xml:"entry"`        // 2009-06-24...
                                Organization   []string `xml:"organization"` // Proceeding...
                                Title          string   `xml:"title"`        // Introducti...
                                Type           string   `xml:"type"`
                                Author         []string `xml:"author"`       // KRAMPEN, G..
                                Copyright      string   `xml:"copyright"`    // Das Urhebe...
                                OtherAccess    string   `xml:"other_access"` // url:http:/...
                                Keyword        string   `xml:"keyword"`
                                Period         []string `xml:"period"`
                                Monitoring     string   `xml:"monitoring"`
                                Language       string   `xml:"language"` // en, en, en, e...
                                Abstract       string   `xml:"abstract"` // After a short...
                                Date           string   `xml:"date"`     // 2009-06-22 12...
                        } `xml:"rfc1807"`
                } `xml:"metadata"`
                About string `xml:"about"`
        } `xml:"Record"`
}

Only consider a nested element

$ zek -t metadata fixtures/z.xml
// Metadata was generated 2019-06-11 16:33:26 by tir on hayiti.
type Metadata struct {
        XMLName xml.Name `xml:"metadata"`
        Text    string   `xml:",chardata"`
        Dc      struct {
                Text  string `xml:",chardata"`
                Xmlns string `xml:"xmlns,attr"`
                Title struct {
                        Text  string `xml:",chardata"`
                        Xmlns string `xml:"xmlns,attr"`
                } `xml:"title"`
                Identifier struct {
                        Text  string `xml:",chardata"`
                        Xmlns string `xml:"xmlns,attr"`
                } `xml:"identifier"`
                Rights struct {
                        Text  string `xml:",chardata"`
                        Xmlns string `xml:"xmlns,attr"`
                        Lang  string `xml:"lang,attr"`
                } `xml:"rights"`
                AccessRights struct {
                        Text  string `xml:",chardata"`
                        Xmlns string `xml:"xmlns,attr"`
                } `xml:"accessRights"`
        } `xml:"dc"`
}

Inference across files

$ zek fixtures/a.xml fixtures/b.xml fixtures/c.xml
// A was generated 2017-12-05 17:40:14 by tir on apollo.
type A struct {
	XMLName xml.Name `xml:"a"`
	Text    string   `xml:",chardata"`
	B       []struct {
		Text string `xml:",chardata"`
	} `xml:"b"`
}

This is also useful, if you deal with archives containing XML files:

$ unzip -p 4082359.zip '*.xml' | zek -e

Given a directory full of zip files, you can combined find, unzip and zek:

$ for i in $(find ftp/b571 -type f -name "*zip"); do unzip -p $i '*xml'; done | zek -e

Another example (tarball with thousands of XML files, seemingly MARC):

$ tar -xOzf /tmp/20180725.125255.tar.gz | zek -e
// OAIPMH was generated 2018-09-26 15:03:29 by tir on sol.
type OAIPMH struct {
        XMLName        xml.Name `xml:"OAI-PMH"`
        Text           string   `xml:",chardata"`
        Xmlns          string   `xml:"xmlns,attr"`
        Xsi            string   `xml:"xsi,attr"`
        SchemaLocation string   `xml:"schemaLocation,attr"`
        ListRecords    struct {
                Text   string `xml:",chardata"`
                Record struct {
                        Text   string `xml:",chardata"`
                        Header struct {
                                Text       string `xml:",chardata"`
                                Identifier struct {
                                        Text string `xml:",chardata"` // aleph-pub:000000001, ...
                                } `xml:"identifier"`
                        } `xml:"header"`
                        Metadata struct {
                                Text   string `xml:",chardata"`
                                Record struct {
                                        Text           string `xml:",chardata"`
                                        Xmlns          string `xml:"xmlns,attr"`
                                        Xsi            string `xml:"xsi,attr"`
                                        SchemaLocation string `xml:"schemaLocation,attr"`
                                        Leader         struct
                                                Text string `xml:",chardata"` // 00001nM2.01200024
                                        } `xml:"leader"`
                                        Controlfield []struct {
                                                Text string `xml:",chardata"` // 00001nM2.01200024
                                                Tag  string `xml:"tag,attr"`
                                        } `xml:"controlfield"`
                                        Datafield []struct {
                                                Text     string `xml:",chardata"`
                                                Tag      string `xml:"tag,attr"`
                                                Ind1     string `xml:"ind1,attr"`
                                                Ind2     string `xml:"ind2,attr"`
                                                Subfield []struct {
                                                        Text string `xml:",chardata"` // KM0000002
                                                        Code string `xml:"code,attr"`
                                                } `xml:"subfield"`
                                        } `xml:"datafield"`
                                } `xml:"record"`
                        } `xml:"metadata"`
                } `xml:"record"`
        } `xml:"ListRecords"`
}

Generate a package

If you want in include generated file in the build process, e.g. with go
generate
, you may find -P and -o
helpful.

$ cat fixtures/b.xml
<a><b></b></a>

Run on the command line or via go generate:

$ zek -P mypkg -o data.go < fixtures/b.xml

This would write out the following in data.go file:

// Code generated by zek; DO NOT EDIT.

package mypkg

import "encoding/xml"

// A was generated 2021-09-16 11:23:06 by tir on trieste.
type A struct {
        XMLName xml.Name `xml:"a"`
        Text    string   `xml:",chardata"`
        B       string   `xml:"b"`
}

Note that any existing file will be overwritten, without any warning.

Use innerxml instead of chardata

You may want chardata or innerxml tag. Default is chardata, to use innerxml use the -I flag.

$ zek -B -I fixtures/d.xml
// Root was generated automatically by zek 0.1.24. DO NOT EDIT.
type Root struct {
        XMLName xml.Name `xml:"root"`
        Text    string   `xml:",innerxml"`
        A       []struct {
                Text string `xml:",innerxml"`
                B    []struct {
                        Text string `xml:",innerxml"`
                        C    string `xml:"c"`
                        D    string `xml:"d"`
                } `xml:"b"`
        } `xml:"a"`
}

Misc

As a side effect, zek seems to be a useful for debugging. Example:

This record is emitted from a typical OAI
server (OJS, not even uncommon), yet one can quickly
spot the flaw in the structure.

Over 30 different struct generated manually in the course of a few hours
(around five minutes per source): https://git.io/vbTDo.

-- Current extent leader: 1532 lines struct

主要指标

概览
名称与所有者miku/zek
主编程语言Go
编程语言Go (语言数: 3)
平台
许可证GNU General Public License v3.0
所有者活动
创建于2017-11-23 19:03:11
推送于2025-04-07 21:39:24
最后一次提交2025-04-07 23:38:39
发布数28
最新版本名称v0.1.28 (发布于 )
第一版名称v0.1.0 (发布于 )
用户参与
星数761
关注者数19
派生数63
提交数295
已启用问题?
问题数21
打开的问题数8
拉请求数5
打开的拉请求数1
关闭的拉请求数2
项目设置
已启用Wiki?
已存档?
是复刻?
已锁定?
是镜像?
是私有?