elvm

EsoLangVM Compiler Infrastructure

  • Owner: shinh/elvm
  • Platform:
  • License:: MIT License
  • Category::
  • Topic:
  • Like:
    0
      Compare:

Github stars Tracking Chart

ELVM Compiler Infrastructure

Build Status

ELVM is similar to LLVM but dedicated to Esoteric
Languages
. This project consists
of two components - frontend and backend. Currently, the only frontend
we have is a modified version of
8cc. The modified 8cc translates C
code to an internal representation format called
ELVM IR (EIR).
Unlike LLVM bitcode, EIR is designed to be extremely simple, so
there's more chance we can write a translator from EIR to an esoteric
language.

Currently, there are 39 backends:

The above list contains languages which are known to be difficult to
program in, but with ELVM, you can create programs in such
languages. You can easily create Brainfuck programs by writing C code
for example. One of interesting testcases ELVM has is a tiny Lisp
interpreter
. The
all above language backends are passing the test, which means you can
run Lisp on the above languages.

Moreover, 8cc and ELVM themselves are written in C. So we can run a C
compiler written in the above languages to compile the ELVM's compiler
toolchain itself, though such compilation takes long time in some
esoteric languages.

A demo site

http://shinh.skr.jp/elvm/8cc.js.html

As written, ELVM toolchain itself runs on all supported language
backends. The above demo runs ELVM toolchain on JavaScript (thus slow).

Example big programs

ELVM internals

ELVM IR

  • Harvard architecture, not Neumann (allowing self-modifying code is hard)
  • 6 registers: A, B, C, D, SP, and BP
  • Ops: mov, add, sub, load, store, setcc, jcc, putc, getc, and exit
  • Psuedo ops: .text, .data, .long, and .string
  • mul/div/mod are implemented by _builtin*
  • No bit operations
  • No floating point arithmetic
  • sizeof(char) == sizeof(int) == sizeof(void*) == 1
  • The word-size is backend dependent, but most backend uses 24bit words
  • A single programming counter may contain multiple operations

See ELVM.md for
more detail.

Directories

shinh/8cc's eir branch is the
frontend C compiler.

ir/ directory has a
parser and an interpreter of ELVM IR. ELVM IR has

target/ directory
has backend implementations. Code in this directory uses the IR parser
to generate backend code.

libc/ directory
has an incomplete libc implementation which is necessary to run
tests.

Notes on language backends

Brainfuck

Running a Lisp interpreter on Brainfuck was the first motivation of
this project (bflisp). ELVM IR is
designed for Brainfuck but it turned out such a simple IR could be
suitable for other esoteric languages.

As Brainfuck is slow, this project contains a Brainfuck
interpreter/compiler in
tools/bfopt.cc.
You can also use other optimized Brainfuck implementations such as
tritium.
Note you need implementations with 8bit cells. For tritium, you need
to specify `-b' flag.

Unlambda

This backend was contributed by @irori.
See also 8cc.unl.

This backend is tested with @irori's
interpreter
. tools/rununl.sh
automatically downloads it.

C-INTERCAL

This backend uses 16bit registers and address space, though ELVM's
standard is 24bit. Due to the lack of address space, you cannot
compile large C programs using 8cc on C-INTERCAL.

This backend won't be tested by default because C-INTERCAL is slow. Use

$ CINT=1 make i

to run them. Note you may need to adjust tools/runi.sh.

You can make faster executables by doing something like

$ cp out/fizzbuzz.c.eir.i fizzbuzz.i && ick fizzbuzz.i
$ ./fizzbuzz

But compilation takes much more time as it uses gcc instead of tcc.

Piet

This backend also has 16bit address space. There's the same limitation
as C-INTERCAL's.

This backend won't be tested by default because npiet is slow. Use

$ PIET=1 make piet

to run them.

Befunge

BefLisp, which translates LLVM
bitcode to Befunge, has very similar code. The interpreter,
tools/befunge.cc is mostly Befunge-93, but its address space is
extended to make Befunge-93 Turing-complete.

Whitespace

This backend is tested with @koturn's Whitespace
implementation
.

Emacs Lisp

This backend is somewhat more interesting than other non-esoteric
backends. You can run a C compiler on Emacs:

  • M-x load-file tools/elvm.el
  • open test/putchar.c (or write C code without #include)
  • M-x 8cc
  • Now you'll see ELVM IR. You need to prepend a backend name (`el' for
    example) as the first line.
  • M-x elc
  • M-x eval-buffer
  • M-x elvm-main

Vim script

This backend was contributed by @rhysd. You can run a C compiler on
Vim:

  • Open test/hello.c (or write your C code)
  • :source /path/to/out/8cc.vim
  • Now you can see ELVM IR in the buffer
  • Please prepend a backend name (vim for Vim) to the first line
  • :source /path/to/out/elc.vim
  • You can see Vim script code as the compilation result in current buffer
  • You can :source to run the code

You can find more descriptions and released vim script in
8cc.vim.

TeX

This backend was contributed by @hak7a3. See
also 8cc.tex.

C++14 constexpr (compile-time)

This backend was contributed by @kw-udon. You can find more
descriptions in
constexpr-8cc.

sed

This backend is very slow so only limited tests run by default. You
can run them by

$ FULL=1 make sed

but it could take years to run all tests. I believe C compiler in sed
works, but I haven't confirmed it's working yet. You can try Lisp
interpreter instead:

$ FULL=1 make out/lisp.c.eir.sed.out.diff
$ echo '(+ 4 3)', time sed -n -f out/lisp.c.eir.sed

This backend should support both GNU sed and BSD sed, so this backend
is more portable than sedlisp,
though much slower. Also note, due to limitation of BSD sed, programs
cannot output non-ASCII characters and NUL.

HeLL

This backend was contributed by @esoteric-programmer.
HeLL is an assembly language for Malbolge and Malbolge Unshackled.
Use LMFAO to build the Malbolge Unshackled program from HeLL.
This backend won't be tested by default because Malbolge Unshackled is extremely slow. Use

$ HELL=1 make hell

to run them. Note you may need to adjust tools/runhell.sh.

This backend does not support all 8-bit characters on I/O, because I/O of Malbolge Unshackled
uses Unicode codepoints instead of single bytes in getc/putc calls.
Further, the Malbolge Unshackled interpreter automatically converts newlines read from stdin,
which cannot be revert in a platform independent way.
The backend reverts/converts newlines from input to Linux encoding and
applies modulo 256 operations to all input and output,
but it cannot compensate the issues this way.
You should limit I/O to ASCII characters in order to avoid unexpected behaviour or crashes.

This backend may be replaced by a Malbolge Unshackled backend in the future.

TensorFlow

Thanks to control flow operations such as tf.while_loop and tf.cond,
a TensorFlow's graph is Turing complete. This backend translates EIR
to a Python code which constructs a graph which is equivalent to the
source EIR. This backend is very slow and uses a huge amount of
memory. I've never seen 8cc.c.eir.tf works, but lisp.c.eir.tf does
work. You can test this backend by

$ TF=1 make tf

TODO: Reduce the size of the graph and run 8cc

Future works

I'm interested in

  • adding more backends (e.g., 16bit CPU, Malbolge Unshackled, ...)
  • running more programs (e.g., lua.bf or mruby.bf?)
  • supporting more C features (e.g., bit operations)
  • eliminating unnecessary code in 8cc

Adding a backend shouldn't be extremely difficult. PRs are welcomed!

See also

This project is a sequel of bflisp.

Acknowledgement

I'd like to thank Rui Ueyama for his
easy-to-hack compiler and suggesting the basic idea which made this
possible.

Main metrics

Overview
Name With Ownershinh/elvm
Primary LanguageC
Program languageMakefile (Language Count: 10)
Platform
License:MIT License
所有者活动
Created At2016-09-16 05:45:46
Pushed At2024-07-26 03:11:43
Last Commit At2024-07-26 12:11:43
Release Count0
用户参与
Stargazers Count1.2k
Watchers Count34
Fork Count142
Commits Count731
Has Issues Enabled
Issues Count34
Issue Open Count14
Pull Requests Count89
Pull Requests Open Count3
Pull Requests Close Count9
项目设置
Has Wiki Enabled
Is Archived
Is Fork
Is Locked
Is Mirror
Is Private