Hello!

The tech behind parts of ZincBase was acquired.
This repo is still here for reference, but it is deprecated.

Fortunately, work still goes on. Apart from a couple of fringe bits, the active repo lives here.

The new owner of ZincBase as it is today is ComplexDB.

Alright, you still want to continue

ZincBase is a state of the art knowledge base. It does the following:

Extract facts (aka triples and rules) from unstructured data/text
Store and retrieve those facts efficiently
Build them into a graph
Provide ways to query the graph, including via bleeding-edge graph neural networks.

Zincbase exists to answer questions like "what is the probability that Tom likes LARPing", or "who likes LARPing", or "classify people into LARPers vs normies":

It combines the latest in neural networks with symbolic logic (think expert systems and prolog) and graph search.

View full documentation here.

Quickstart

from zincbase import KB
kb = KB()
kb.store('eats(tom, rice)')
for ans in kb.query('eats(tom, Food)'):
    print(ans['Food']) # prints 'rice'

...
# The included assets/countries_s1_train.csv contains triples like:
# (namibia, locatedin, africa)
# (lithuania, neighbor, poland)

kb = KB()
kb.from_csv('./assets/countries.csv')
kb.build_kg_model(cuda=False, embedding_size=40)
kb.train_kg_model(steps=2000, batch_size=1, verbose=False)
kb.estimate_triple_prob('fiji', 'locatedin', 'melanesia')
0.8467

Requirements

Python 3
Libraries from requirements.txt
GPU preferable for large graphs but not required

Installation

pip install -r requirements.txt

Note: Requirements might differ for PyTorch depending on your system.

Testing

python test/test_main.py
python test/test_graph.py
python test/test_lists.py
python test/test_nn_basic.py
python test/test_nn.py
python test/test_neg_examples.py
python test/test_truthiness.py
python -m doctest zincbase/zincbase.py

Validation

"Countries" and "FB15k" datasets are included in this repo.

There is a script to evaluate that ZincBase gets at least as good
performance on the Countries dataset as the original (2019) RotatE paper. From the repo's
root directory:

python examples/eval_countries_s3.py

It tests the hardest Countries task and prints out the AUC ROC, which should be
~ 0.95 to match the paper. It takes about 30 minutes to run on a modern GPU.

There is also a script to evaluate performance on FB15k: python examples/fb15k_mrr.py.

Building documentation

From docs/ dir: make html. If something changed a lot: sphinx-apidoc -o . ..

TODO

Add documentation
to_csv method
utilize postgres as backend triple store
The to_csv/from_csv methods do not yet support node attributes.
Add relation extraction from arbitrary unstructured text
Add context to triple - that is interpreted by BERT/ULM/GPT-2 similar and
put into an embedding that's concat'd to the KG embedding.
Reinforcement learning for graph traversal.

References & Acknowledgements

Theo Trouillon. Complex-Valued Embedding Models for Knowledge Graphs. Machine Learning[cs.LG]. Université Grenoble Alpes, 2017. English. ffNNT : 2017GREAM048

L334: Computational Syntax and Semantics -- Introduction to Prolog, Steve Harlow

Open Book Project: Prolog in Python, Chris Meyers

Prolog Interpreter in Javascript

RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space, Zhiqing Sun and Zhi-Hong Deng and Jian-Yun Nie and Jian Tang, International Conference on Learning Representations, 2019

Citing

If you use this software, please consider citing:

@software{zincbase,
  author = {{Tom Grek}},
  title = {ZincBase: A state of the art knowledge base},
  url = {https://github.com/tomgrek/zincbase},
  version = {0.1.1},
  date = {2019-05-12}
}

Contributing

See CONTRIBUTING. And please do!

名称与所有者	tomgrek/zincbase
主编程语言	Python
编程语言	Python (语言数: 1)
平台
许可证	MIT License

创建于	2019-04-27 23:34:55
推送于	2021-04-29 20:06:53
最后一次提交	2020-01-29 20:50:43
发布数	6
最新版本名称	v0.3.0 (发布于 )
第一版名称	v0.0.1 (发布于 )

星数	282
关注者数	10
派生数	37
提交数	114
已启用问题?
问题数	4
打开的问题数	3
拉请求数	14
打开的拉请求数	1
关闭的拉请求数	1

已启用Wiki?
已存档?
是复刻?
已锁定?
是镜像?
是私有?

zincbase

Github星跟踪图