zek
Zek is a prototype for creating a Go
struct from an XML document. The
resulting struct works best for reading XML (see also
#14), to create XML, you might want to
use something else.
It was developed at Leipzig University Library to shorten the
time to go from raw XML to a struct that allows to access XML data in Go
programs.
Skip the fluff, just the code.
Given some XML, run:
$ curl -s https://raw.githubusercontent.com/miku/zek/master/fixtures/e.xml | zek -e
// Rss was generated 2018-08-30 20:24:14 by tir on sol.
type Rss struct {
XMLName xml.Name `xml:"rss"`
Text string `xml:",chardata"`
Rdf string `xml:"rdf,attr"`
Dc string `xml:"dc,attr"`
Geoscan string `xml:"geoscan,attr"`
Media string `xml:"media,attr"`
Gml string `xml:"gml,attr"`
Taxo string `xml:"taxo,attr"`
Georss string `xml:"georss,attr"`
Content string `xml:"content,attr"`
Geo string `xml:"geo,attr"`
Version string `xml:"version,attr"`
Channel struct {
Text string `xml:",chardata"`
Title string `xml:"title"` // ESS New Releases (Display...
Link string `xml:"link"` // http://tinyurl.com/ESSNew...
Description string `xml:"description"` // New releases from the Ear...
LastBuildDate string `xml:"lastBuildDate"` // Mon, 27 Nov 2017 00:06:35...
Item []struct {
Text string `xml:",chardata"`
Title string `xml:"title"` // Surficial geology, Aberde...
Link string `xml:"link"` // https://geoscan.nrcan.gc....
Description string `xml:"description"` // Geological Survey of Cana...
Guid struct {
Text string `xml:",chardata"` // 304279, 306212, 306175, 3...
IsPermaLink string `xml:"isPermaLink,attr"`
} `xml:"guid"`
PubDate string `xml:"pubDate"` // Fri, 24 Nov 2017 00:00:00...
Polygon []string `xml:"polygon"` // 64.0000 -98.0000 64.0000 ...
Download string `xml:"download"` // https://geoscan.nrcan.gc....
License string `xml:"license"` // http://data.gc.ca/eng/ope...
Author string `xml:"author"` // Geological Survey of Cana...
Source string `xml:"source"` // Geological Survey of Cana...
SndSeries string `xml:"SndSeries"` // Bedford Institute of Ocea...
Publisher string `xml:"publisher"` // Natural Resources Canada,...
Edition string `xml:"edition"` // prelim., surficial data m...
Meeting string `xml:"meeting"` // Geological Association of...
Documenttype string `xml:"documenttype"` // serial, open file, serial...
Language string `xml:"language"` // English, English, English...
Maps string `xml:"maps"` // 1 map, 5 maps, Publicatio...
Mapinfo string `xml:"mapinfo"` // surficial geology, surfic...
Medium string `xml:"medium"` // on-line; digital, digital...
Province string `xml:"province"` // Nunavut, Northwest Territ...
Nts string `xml:"nts"` // 066B, 095J; 095N; 095O; 0...
Area string `xml:"area"` // Aberdeen Lake, Mackenzie ...
Subjects string `xml:"subjects"`
Program string `xml:"program"` // GEM2: Geo-mapping for Ene...
Project string `xml:"project"` // Rae Province Project Mana...
Projectnumber string `xml:"projectnumber"` // 340521, 343202, 340557, 3...
Abstract string `xml:"abstract"` // This new surficial geolog...
Links string `xml:"links"` // Online - En ligne (PDF, 9...
Readme string `xml:"readme"` // readme | https://geoscan....
PPIid string `xml:"PPIid"` // 34532, 35096, 35438, 2563...
} `xml:"item"`
} `xml:"channel"`
}
Online
- try online via WASM: https://xml-to-go.github.io/, thanks YaroslavPodorvanov!
- try it online at https://blog.kowalczyk.info/tools/xmltogo/ -- thanks, kjk!
About
Upsides:
- it works fine for non-recursive structures,
- does not need XSD or DTD,
- it is relatively convenient to access attributes, children and text,
- will generate a single struct, which make for a quite compact representation,
- simple user interface,
- comments with examples,
- schema inference across multiple files.
Downsides:
- experimental, early, buggy, unstable prototype,
- no support for recursive types (similar to Russian Doll strategy, [1])
- no type inference, everything is accessible as string (without a schema, type inference may fail if the type guess is wrong)
Bugs:
Mapping between XML elements and data structures is inherently flawed: an XML
element is an order-dependent collection of anonymous values, while a data
structure is an order-independent collection of named values.
https://golang.org/pkg/encoding/xml/#pkg-note-BUG
Related projects:
- https://github.com/bemasher/JSONGen
- https://github.com/dutchcoders/XMLGen
- https://github.com/gnewton/chidley
- https://github.com/twpayne/go-xmlstruct
And other awesome XML utilities.
Presentations:
Install
$ go install github.com/miku/zek/cmd/zek@latest
Debian and RPM packages:
It's in AUR, too.

Usage
$ zek -h
-B use a fixed banner string (e.g. for CI)
-C emit less compact struct
-F skip formatting
-I use verbatim innerxml instead of chardata
-P string
if set, write out struct within a package with the given name
-S int
read at most this many tags, approximately (0=unlimited)
-c emit more compact struct (noop, as this is the default since 0.1.7)
-d debug output
-e add comments with example
-j add JSON tags
-m omit empty Text fields
-max-examples int
limit number of examples (default 10)
-n string
use a different name for the top-level struct
-o string
if set, write to output file, not stdout
-p write out an example program
-s strict parsing and writing
-t string
emit struct for tag matching this name
-u filter out duplicated examples
-version
show version
-x int
max chars for example (default 25)
Examples:
$ cat fixtures/a.xml
<a></a>
$ zek -C < fixtures/a.xml
type A struct {
XMLName xml.Name `xml:"a"`
Text string `xml:",chardata"`
}
Debug output dumps the internal tree as JSON to stdout.
$ zek -d < fixtures/a.xml
{"name":{"Space":"","Local":"a"}}
Example program:
package main
import (
"encoding/json"
"encoding/xml"
"fmt"
"log"
"os"
)
// A was generated 2017-12-05 17:35:21 by tir on apollo.
type A struct {
XMLName xml.Name `xml:"a"`
Text string `xml:",chardata"`
}
func main() {
dec := xml.NewDecoder(os.Stdin)
var doc A
if err := dec.Decode(&doc); err != nil {
log.Fatal(err)
}
b, err := json.Marshal(doc)
if err != nil {
log.Fatal(err)
}
fmt.Println(string(b))
}
$ zek -C -p < fixtures/a.xml > sample.go && go run sample.go < fixtures/a.xml | jq . && rm sample.go
{
"XMLName": {
"Space": "",
"Local": "a"
},
"Text": ""
}
More complex example:
$ zek < fixtures/d.xml
// Root was generated 2019-06-11 16:27:04 by tir on hayiti.
type Root struct {
XMLName xml.Name `xml:"root"`
Text string `xml:",chardata"`
A []struct {
Text string `xml:",chardata"`
B []struct {
Text string `xml:",chardata"`
C string `xml:"c"`
D string `xml:"d"`
} `xml:"b"`
} `xml:"a"`
}
$ zek -p < fixtures/d.xml > sample.go && go run sample.go < fixtures/d.xml | jq . && rm sample.go
{
"XMLName": {
"Space": "",
"Local": "root"
},
"Text": "\n\n\n\n",
"A": [
{
"Text": "\n \n \n",
"B": [
{
"Text": "\n \n ",
"C": "Hi",
"D": ""
},
{
"Text": "\n \n \n ",
"C": "World",
"D": ""
}
]
},
{
"Text": "\n \n",
"B": [
{
"Text": "\n \n ",
"C": "Hello",
"D": ""
}
]
},
{
"Text": "\n \n",
"B": [
{
"Text": "\n \n ",
"C": "",
"D": "World"
}
]
}
]
}
Annotate with comments:
$ zek -e < fixtures/l.xml
// Records was generated 2019-06-11 16:29:35 by tir on hayiti.
type Records struct {
XMLName xml.Name `xml:"Records"`
Text string `xml:",chardata"` // \n
Xsi string `xml:"xsi,attr"`
Record []struct {
Text string `xml:",chardata"`
Header struct {
Text string `xml:",chardata"`
Status string `xml:"status,attr"`
Identifier string `xml:"identifier"` // oai:ojs.localhost:article...
Datestamp string `xml:"datestamp"` // 2009-06-24T14:48:23Z, 200...
SetSpec string `xml:"setSpec"` // eppp:ART, eppp:ART, eppp:...
} `xml:"header"`
Metadata struct {
Text string `xml:",chardata"`
Rfc1807 struct {
Text string `xml:",chardata"`
Xmlns string `xml:"xmlns,attr"`
Xsi string `xml:"xsi,attr"`
SchemaLocation string `xml:"schemaLocation,attr"`
BibVersion string `xml:"bib-version"` // v2, v2, v2...
ID string `xml:"id"` // http://jou...
Entry string `xml:"entry"` // 2009-06-24...
Organization []string `xml:"organization"` // Proceeding...
Title string `xml:"title"` // Introducti...
Type string `xml:"type"`
Author []string `xml:"author"` // KRAMPEN, G..
Copyright string `xml:"copyright"` // Das Urhebe...
OtherAccess string `xml:"other_access"` // url:http:/...
Keyword string `xml:"keyword"`
Period []string `xml:"period"`
Monitoring string `xml:"monitoring"`
Language string `xml:"language"` // en, en, en, e...
Abstract string `xml:"abstract"` // After a short...
Date string `xml:"date"` // 2009-06-22 12...
} `xml:"rfc1807"`
} `xml:"metadata"`
About string `xml:"about"`
} `xml:"Record"`
}
Only consider a nested element
$ zek -t metadata fixtures/z.xml
// Metadata was generated 2019-06-11 16:33:26 by tir on hayiti.
type Metadata struct {
XMLName xml.Name `xml:"metadata"`
Text string `xml:",chardata"`
Dc struct {
Text string `xml:",chardata"`
Xmlns string `xml:"xmlns,attr"`
Title struct {
Text string `xml:",chardata"`
Xmlns string `xml:"xmlns,attr"`
} `xml:"title"`
Identifier struct {
Text string `xml:",chardata"`
Xmlns string `xml:"xmlns,attr"`
} `xml:"identifier"`
Rights struct {
Text string `xml:",chardata"`
Xmlns string `xml:"xmlns,attr"`
Lang string `xml:"lang,attr"`
} `xml:"rights"`
AccessRights struct {
Text string `xml:",chardata"`
Xmlns string `xml:"xmlns,attr"`
} `xml:"accessRights"`
} `xml:"dc"`
}
Inference across files
$ zek fixtures/a.xml fixtures/b.xml fixtures/c.xml
// A was generated 2017-12-05 17:40:14 by tir on apollo.
type A struct {
XMLName xml.Name `xml:"a"`
Text string `xml:",chardata"`
B []struct {
Text string `xml:",chardata"`
} `xml:"b"`
}
This is also useful, if you deal with archives containing XML files:
$ unzip -p 4082359.zip '*.xml' | zek -e
Given a directory full of zip files, you can combined find, unzip and zek:
$ for i in $(find ftp/b571 -type f -name "*zip"); do unzip -p $i '*xml'; done | zek -e
Another example (tarball with thousands of XML files, seemingly MARC):
$ tar -xOzf /tmp/20180725.125255.tar.gz | zek -e
// OAIPMH was generated 2018-09-26 15:03:29 by tir on sol.
type OAIPMH struct {
XMLName xml.Name `xml:"OAI-PMH"`
Text string `xml:",chardata"`
Xmlns string `xml:"xmlns,attr"`
Xsi string `xml:"xsi,attr"`
SchemaLocation string `xml:"schemaLocation,attr"`
ListRecords struct {
Text string `xml:",chardata"`
Record struct {
Text string `xml:",chardata"`
Header struct {
Text string `xml:",chardata"`
Identifier struct {
Text string `xml:",chardata"` // aleph-pub:000000001, ...
} `xml:"identifier"`
} `xml:"header"`
Metadata struct {
Text string `xml:",chardata"`
Record struct {
Text string `xml:",chardata"`
Xmlns string `xml:"xmlns,attr"`
Xsi string `xml:"xsi,attr"`
SchemaLocation string `xml:"schemaLocation,attr"`
Leader struct
Text string `xml:",chardata"` // 00001nM2.01200024
} `xml:"leader"`
Controlfield []struct {
Text string `xml:",chardata"` // 00001nM2.01200024
Tag string `xml:"tag,attr"`
} `xml:"controlfield"`
Datafield []struct {
Text string `xml:",chardata"`
Tag string `xml:"tag,attr"`
Ind1 string `xml:"ind1,attr"`
Ind2 string `xml:"ind2,attr"`
Subfield []struct {
Text string `xml:",chardata"` // KM0000002
Code string `xml:"code,attr"`
} `xml:"subfield"`
} `xml:"datafield"`
} `xml:"record"`
} `xml:"metadata"`
} `xml:"record"`
} `xml:"ListRecords"`
}
Generate a package
If you want in include generated file in the build process, e.g. with go
generate, you may find -P and -o
helpful.
$ cat fixtures/b.xml
<a><b></b></a>
Run on the command line or via go generate:
$ zek -P mypkg -o data.go < fixtures/b.xml
This would write out the following in data.go file:
// Code generated by zek; DO NOT EDIT.
package mypkg
import "encoding/xml"
// A was generated 2021-09-16 11:23:06 by tir on trieste.
type A struct {
XMLName xml.Name `xml:"a"`
Text string `xml:",chardata"`
B string `xml:"b"`
}
Note that any existing file will be overwritten, without any warning.
Use innerxml instead of chardata
You may want chardata or innerxml tag. Default is chardata, to use innerxml use the -I flag.
$ zek -B -I fixtures/d.xml
// Root was generated automatically by zek 0.1.24. DO NOT EDIT.
type Root struct {
XMLName xml.Name `xml:"root"`
Text string `xml:",innerxml"`
A []struct {
Text string `xml:",innerxml"`
B []struct {
Text string `xml:",innerxml"`
C string `xml:"c"`
D string `xml:"d"`
} `xml:"b"`
} `xml:"a"`
}
Misc
As a side effect, zek seems to be a useful for debugging. Example:
This record is emitted from a typical OAI
server (OJS, not even uncommon), yet one can quickly
spot the flaw in the structure.
Over 30 different struct generated manually in the course of a few hours
(around five minutes per source): https://git.io/vbTDo.
-- Current extent leader: 1532 lines struct