Graphtage

一款语义差异工具和库,适用于 JSON、JSON5、XML、HTML、YAML 和 CSV 等树状文件。「A semantic diff utility and library for tree-like files such as JSON, JSON5, XML, HTML, YAML, and CSV.」

Github stars Tracking Chart

Graphtage

Graphtage 是一款命令行工具和底层库,用于对 JSON、XML、HTML、YAML、plist 和 CSS 文件等树状结构进行语义比较和合并。它的名字是 "graph"(图形)和 "graftage"(嫁接)的谐音,"graftage "是园艺学中将两棵树连接在一起使其合二为一的做法。

$ echo Original: && cat original.json && echo Modified: && cat modified.json
 
Original:
{
    "foo": [1, 2, 3, 4],
    "bar": "testing"
}
Modified:
{
    "foo": [2, 3, 4, 5],
    "zab": "testing",
    "woo": ["foobar"]
}
 
$ graphtage original.json modified.json
 
{
    "z̟b̶ab̟r̶": "testing",
    "foo": [
        1̶,̶
        2,
        3,
        4,̟
        5̟
    ],̟
    "̟w̟o̟o̟"̟:̟ ̟[̟
        "̟f̟o̟o̟b̟a̟r̟"̟
    
}
 

Installation 安装

$ pip3 install graphtage
 

命令行用法

输出格式

Graphtage 会对树的中间表示形式进行分析,这种表示形式与输入文件的文件类型无关。举例来说,这意味着你可以将 JSON 文件与 YAML 文件进行比较。此外,输出格式也可以与输入格式不同。默认情况下,Graphtage 会以与第一个输入文件相同的文件格式对输出 diff 进行格式化。例如,你可以对两个 JSON 文件进行差异,并以 YAML 格式输出。有几个命令行参数可以指定这些转换,例如 --format ;请查看 --help 输出以获取更多信息。

默认情况下,Graphtage 在打印输出时尽可能多地使用换行和缩进。

{
    "foo": [
        1,
        2,
        3
    ],
    "bar": "baz"
}
 

使用 --join-lists 或 -jl 选项可抑制列表项后的换行符:

{
    "foo": [1, 2, 3],
    "bar": "baz"
}
 

同样,使用 --join-dict-items 或 -jd 选项,可以抑制 dict 中键/值对后的换行符:

{"foo": [
    1,
    2,
    3
], "bar":  "baz"}
 

使用 --condensed 或 -j 应用这两个选项:

{"foo": [1, 2, 3], "bar": "baz"}
 

使用 --only-edits 或 -e 选项将打印出编辑列表,而不是将其应用到输入文件中。

edit-digest或-d选项与--only-edits类似,但会为每次编辑打印更简洁的上下文,更易于人阅读。

匹配选项

默认情况下,Graphtage 会尝试匹配字典中所有可能的元素对。

将两个字典相互匹配是很困难的。虽然在计算上是可控的,但对于字典数量庞大的输入文件来说,这有时会很麻烦。Graphtage 有三种不同的词典匹配策略:

  1. --dict-strategy match (计算成本最高)试图匹配两个字典之间的所有键和值对,从而实现最小编辑距离的匹配;
  2. --dict-strategy none (计算成本最低)不会尝试匹配任何键/值对,除非它们的键完全相同;以及
  3. --dict-strategy auto (默认设置)将自动匹配键值相同的键值对的值,然后对其余键值对使用匹配策略。

See Pull Request #51 for some examples of how these strategies affect output.

有关这些策略如何影响输出的一些示例,请参见 Pull Request #51

在比较两个列表时,"--no-list-edits" 或 "-l" 选项不会考虑插入和移除的内容。当列表长度相同时,--no-list-edits-when-same-length 或 -ll 选项是 -l 选项的一个较温和的版本,对于长度不同的列表,它的表现正常,但对于长度相同的列表,它的表现与 -l 相同。

ANSI Color

默认情况下,如果从 TTY 运行,Graphtage 只在输出中使用 ANSI 颜色。例如,如果你想让 Graphtage 从脚本或管道中输出彩色输出,请使用 --color 或 -c 参数。要在 TTY 上运行时也禁用彩色,请使用 --no-color 参数。

HTML Output

使用 --html 选项,Graphtage 可以选择以 HTML 格式显示差异。

$ graphtage --html original.json modified.json > diff.html
 

状态和日志

默认情况下,Graphtage 会向 STDERR 打印状态信息和进度条。要抑制这种情况,请使用 --no-status 选项。要进一步抑制除关键日志信息外的所有日志信息,请使用 --quiet 选项。通过 --log-level 选项可以对日志信息进行精细控制。

Graphtage 为何存在?

Diffing tree-like structures with unordered elements is tough. Say you want to compare two JSON files. There are limited tools available, which are effectively equivalent to canonicalizing the JSON (e.g., sorting dictionary elements by key) and performing a standard diff. This is not always sufficient. For example, if a key in a dictionary is changed but its value is not, a traditional diff will conclude that the entire key/value pair was replaced by the new one, even though the only change was the key itself. See our documentation for more information.

比较带有无序元素的树状结构非常困难。比方说,你想比较两个 JSON 文件。可用的工具很有限(limited tools available),实际上相当于对 JSON 进行规范化(例如按键对字典元素进行排序),然后执行标准差异。但这并不总是足够的。例如,如果字典中的键发生了变化,但其值没有变化,那么传统的差异会得出结论:整个键/值对都被新的键/值对替换了,尽管唯一的变化是键本身。更多信息,请参阅我们的文档

将 Graphtage 用作库

Graphtage 拥有一套完整的 API,可通过编程操作其差异化功能。将 Graphtage 作为库使用时,它还能对内存中的 Python 对象进行差异分析。这对于调试 Python 代码非常有用,例如,确定两个对象之间的差异。更多信息,请参阅我们的文档

扩展 Graphtage

Graphtage 的设计具有可扩展性: 可以轻松定义新的文件类型,以及新的节点类型、编辑类型、格式和打印机。更多信息,请参阅我们的文档。

完整的 API 文档可在此处获取。

许可和鸣谢

This research was developed by Trail of Bits with partial funding from the Defense Advanced Research Projects Agency (DARPA) under the SafeDocs program as a subcontractor to Galois. It is licensed under the GNU Lesser General Public License v3.0Contact us if you're looking for an exception to the terms. © 2020–2023, Trail of Bits.

Overview

Name With Ownertrailofbits/graphtage
Primary LanguagePython
Program languagePython (Language Count: 2)
Platform
License:GNU Lesser General Public License v3.0
Release Count14
Last Release Namev0.3.1 (Posted on )
First Release Namev0.1.0 (Posted on )
Created At2020-04-21 16:47:22
Pushed At2024-04-30 15:20:38
Last Commit At2024-01-07 20:33:29
Stargazers Count2.3k
Watchers Count52
Fork Count47
Commits Count599
Has Issues Enabled
Issues Count35
Issue Open Count22
Pull Requests Count46
Pull Requests Open Count2
Pull Requests Close Count4
Has Wiki Enabled
Is Archived
Is Fork
Is Locked
Is Mirror
Is Private

Graphtage

PyPI version
Tests
Slack Status

Graphtage is a command-line utility and underlying library
for semantically comparing and merging tree-like structures, such as JSON, XML, HTML, YAML, plist, and CSS files. Its name is a
portmanteau of “graph” and “graftage”—the latter being the horticultural practice of joining two trees together such
that they grow as one.

$ echo Original: && cat original.json && echo Modified: && cat modified.json
Original:
{
    "foo": [1, 2, 3, 4],
    "bar": "testing"
}
Modified:
{
    "foo": [2, 3, 4, 5],
    "zab": "testing",
    "woo": ["foobar"]
}
$ graphtage original.json modified.json
{
    "z̟b̶ab̟r̶": "testing",
    "foo": [
        1̶,̶
        2,
        3,
        4,̟
        5̟
    ],̟
    "̟w̟o̟o̟"̟:̟ ̟[̟
        "̟f̟o̟o̟b̟a̟r̟"̟
    ]̟
}

Installation

$ pip3 install graphtage

Command Line Usage

Output Formatting

Graphtage performs an analysis on an intermediate representation of the trees that is divorced from the filetypes of the
input files. This means, for example, that you can diff a JSON file against a YAML file. Also, the output format can be
different from the input format(s). By default, Graphtage will format the output diff in the same file format as the
first input file. But one could, for example, diff two JSON files and format the output in YAML. There are several
command-line arguments to specify these transformations, such as --format; please check the --help output for more
information.

By default, Graphtage pretty-prints its output with as many line breaks and indents as possible.

{
    "foo": [
        1,
        2,
        3
    ],
    "bar": "baz"
}

Use the --join-lists or -jl option to suppress linebreaks after list items:

{
    "foo": [1, 2, 3],
    "bar": "baz"
}

Likewise, use the --join-dict-items or -jd option to suppress linebreaks after key/value pairs in a dict:

{"foo": [
    1,
    2,
    3
], "bar":  "baz"}

Use --condensed or -j to apply both of these options:

{"foo": [1, 2, 3], "bar": "baz"}

The --only-edits or -e option will print out a list of edits rather than applying them to the input file in place.

The --edit-digest or -d option is like --only-edits but prints a more concise context for each edit that is more
human-readable.

Matching Options

By default, Graphtage tries to match all possible pairs of elements in a dictionary.

Matching two dictionaries with each other is hard. Although computationally tractable, this can sometimes be onerous for
input files with huge dictionaries. Graphtage has three different strategies for matching dictionaries:

  1. --dict-strategy match (the most computationally expensive) tries to match all pairs of keys and values between the
    two dictionaries, resulting in a match of minimum edit distance;
  2. --dict-strategy none (the least computationally expensive) will not attempt to match any key/value pairs unless
    they have the exact same key; and
  3. --dict-strategy auto (the default) will automatically match the values of any key-value pairs that have identical
    keys and then use the match strategy for the remainder of key/value pairs.

See Pull Request #51 for some examples of how these strategies
affect output.

The --no-list-edits or -l option will not consider interstitial insertions and removals when comparing two lists.
The --no-list-edits-when-same-length or -ll option is a less drastic version of -l that will behave normally for
lists that are of different lengths but behave like -l for lists that are of the same length.

ANSI Color

By default, Graphtage will only use ANSI color in its output if it is run from a TTY. If, for example, you would like
to have Graphtage emit colorized output from a script or pipe, use the --color or -c argument. To disable color even
when running on a TTY, use --no-color.

HTML Output

Graphtage can optionally emit the diff in HTML with the --html option.

$ graphtage --html original.json modified.json > diff.html

Status and Logging

By default, Graphtage prints status messages and a progress bar to STDERR. To suppress this, use the --no-status
option. To additionally suppress all but critical log messages, use --quiet. Fine-grained control of log messages is
via the --log-level option.

Why does Graphtage exist?

Diffing tree-like structures with unordered elements is tough. Say you want to compare two JSON files.
There are limited tools available, which are effectively equivalent to
canonicalizing the JSON (e.g., sorting dictionary elements by key) and performing a standard diff. This is not always
sufficient. For example, if a key in a dictionary is changed but its value is not, a traditional diff
will conclude that the entire key/value pair was replaced by the new one, even though the only change was the key
itself. See our documentation for more information.

Using Graphtage as a Library

Graphtage has a complete API for programmatically operating its diffing capabilities.
When using Graphtage as a library, it is also capable of diffing in-memory Python objects.
This can be useful for debugging Python code, for example, to determine a differential between two objects.
See our documentation for more information.

Extending Graphtage

Graphtage is designed to be extensible: New filetypes can easily be defined, as well as new node types, edit types,
formatters, and printers. See our documentation for
more information.

Complete API documentation is available here.

License and Acknowledgements

This research was developed by Trail of Bits with partial funding from the Defense
Advanced Research Projects Agency (DARPA) under the SafeDocs program as a subcontractor to Galois.
It is licensed under the GNU Lesser General Public License v3.0.
Contact us if you're looking for an exception to the terms.
© 2020–2023, Trail of Bits.

To the top