queryparser

Parsing and analysis of Vertica, Hive, and Presto SQL.

  • Owner: uber/queryparser
  • Platform:
  • License:: MIT License
  • Category::
  • Topic:
  • Like:
    0
      Compare:

Github stars Tracking Chart

queryparser

Parsing and analysis of Vertica, Hive, and Presto SQL.

Gentle introduction

Queryparser supports parsing for three sql-dialects (Vertica, Hive, and
Presto).

Each dialect implements its own tokenization and parsing logic. There is a
single abstract syntax tree (AST) for representing queries of any dialect. This
AST is defined in Database/Sql/Type.hs and Database/Sql/Type/Query.hs.

The parsing logic produces an AST with table and column identifiers that are
"raw" or optionally qualified. Frequently, it is desirable to convert the AST
over raw names to an AST over resolved names, where identifiers are fully
qualified. This transformation is called "name resolution" or simply
"resolution". It requires as input the full list of columns in every table and
the full list of tables in every schema, otherwise known as "catalog
information".

An example of resolution:

  • SELECT column_c FROM table_t is a query with raw names
  • SELECT schema_s.table_t.column_c FROM schema_s.table_t is the same query with resolved names

Various utility functions ("analyses") have been defined for ASTs over resolved
names, such as:

  • what tables appear in the query?
  • what columns appear in the query, per clause?
  • what is the table/column lineage of a query?
  • what sets of columns does a query compare for equality?

The analyses are the main value-add of this library.

Demo

Enough talk, let's show what it can do!

$ stack ghci
...
...> :set prompt "> "
> import Demo
> -- for the purposes of our demo, we have two tables: foo with columns a,b,c and bar with columns x,y,z
> demoAllAnalyses "SELECT * FROM foo" -- note that the SELECT * expands to a,b,c
Tables accessed:
    public.foo
Columns accessed by clause:
    public.foo.a	SELECT
    public.foo.b	SELECT
    public.foo.c	SELECT
Joins:
    no joins
Table lineage:
    no tables modified
> demoAllAnalyses "SELECT * FROM bar" -- and here the SELECT * expands to x,y,z
Tables accessed:
    public.bar
Columns accessed by clause:
    public.bar.x	SELECT
    public.bar.y	SELECT
    public.bar.z	SELECT
Joins:
    no joins
Table lineage:
    no tables modified
> demoAllAnalyses "SELECT x, count(1) FROM foo JOIN bar ON foo.a = bar.y WHERE z IS NOT NULL GROUP BY 1 ORDER BY 2 DESC, b"
Tables accessed:
    public.bar
    public.foo
Columns accessed by clause:
    public.bar.x	GROUPBY
    public.bar.x	SELECT
    public.bar.y	JOIN
    public.bar.z	WHERE
    public.foo.a	JOIN
    public.foo.b	ORDER
Joins:
    public.bar.y <-> public.foo.a
Table lineage:
    no tables modified
> -- let's play with some queries that modify table-data!
> demoTableLineage "INSERT INTO foo SELECT * FROM bar"
public.foo after the query depends on public.bar, public.foo before the query
> demoTableLineage "TRUNCATE TABLE foo"
public.foo no longer has data
> demoTableLineage "ALTER TABLE bar, foo RENAME TO baz, bar"
public.bar after the query depends on public.foo before the query
public.baz after the query depends on public.bar before the query
public.foo no longer has data
> -- let's explore a few subtler behaviors of the "joins" analysis (admittedly, something of a misnomer)
> demoJoins "SELECT * FROM foo JOIN bar ON a=x AND b+c = y+z"
public.bar.x <-> public.foo.a
public.bar.y <-> public.foo.b
public.bar.y <-> public.foo.c
public.bar.z <-> public.foo.b
public.bar.z <-> public.foo.c
> demoJoins "SELECT a FROM foo UNION SELECT x FROM bar"
public.bar.x <-> public.foo.a

Spin up your own ghci and paste in your own queries!

Requirements

To build, you need:

  • stack
  • docker (recommended)

OS X

You can install stack on OS X with brew:

brew install ghc haskell-stack

To install docker, use the default installer found on https://docs.docker.com/mac/

To allow stack to see the docker daemon, add eval "$(docker-machine env default)" to your bashrc or equivalent. (This is following https://docs.docker.com/machine/reference/env/)

Linux & Other

Follow the directions at https://github.com/commercialhaskell/stack#how-to-install

Installation

Once you've got what you need, check out the repo and change to the directory.

Stack will download other dependencies for you:

stack setup

Now you can build the package with:

stack build

Or run the tests with:

stack test

Or run the benchmarks with:

stack bench

Or pull things up in ghci with:

stack ghci

Contributing

If you'd like to contribute to the repo, use the above installation instructions to get started.

When you're ready, make sure that the code compiles with the Development flag, i.e.:

stack build --flag queryparser:development

Use

Mostly it boils down to this function:

parse :: Text -> Either ParseError SqlQuery

To parse some sql from the repl,

parse "SELECT 1;"

Areas of future interest

There is substantial room for future work in Queryparser. For more details, see
Areas of future interest.

Main metrics

Overview
Name With Owneruber/queryparser
Primary LanguageHaskell
Program languageHaskell (Language Count: 1)
Platform
License:MIT License
所有者活动
Created At2017-04-13 04:51:29
Pushed At2022-02-16 11:16:33
Last Commit At2021-12-18 13:33:27
Release Count1
Last Release Namev0.1.0.0 (Posted on )
First Release Namev0.1.0.0 (Posted on )
用户参与
Stargazers Count1.1k
Watchers Count56
Fork Count145
Commits Count50
Has Issues Enabled
Issues Count36
Issue Open Count22
Pull Requests Count26
Pull Requests Open Count1
Pull Requests Close Count4
项目设置
Has Wiki Enabled
Is Archived
Is Fork
Is Locked
Is Mirror
Is Private