Graawk

A work-in-progress implementation of AWK on top of GraalVM.

The eventual goal of this project is compatiblity with the POSIX specification of awk, with high performance on large, well-structured input files.

Motivation

This is mostly a personal "for fun" project.

I heard about how Truffle lets you write an interpreter for a language, and get a high-performance JIT runtime for "free."

At my day job (bioinformatics), I often use AWK to operate on gigabyte+ files that have a pretty regular structure. I think that a JIT could lend some really nice speedups there - automatically specializing at runtime for what fields matter to the AWK program, and the format of the fields themselves.

Project Status

This code should be considered pre-alpha - don't use it for anything that matters.

This project was forked from SimpleLanguage. Most of the internals still refer to the language as SL.

Very very simple AWK programs work, like NR > 1 { print($NF); }. We also have decent function support (thanks SimpleLanguage!), like:

function fib(n) {
    if(n <= 1) { return 1; }
    return fib(n-1) + fib(n-2);
}
1 == 1 {print(fib(NR));}

Roadmap:

Performance

While it is much too early to make any bold claims, initial testing on the subset of AWK currently supported indicates that graawk can be faster (for some toy programs) than gawk or mawk on input sizes > 5,000,000 lines.

graawk is definitely slower than other AWK implementations for small data, and it will probably remain so.

License

TBD! Eventually, I'd like to release all of this under some open-source license, but I haven't thought too hard about which specific one.

SimpleLanguage (whose code still forms the bulk of this project) is licensed under UPL. My own changes are not yet licensed, which as I understand it, means All Rights Reserved. If you're an individual, I'm probably not going to get mad at you for using it, but if you're making profit, I probably will be.

If you have opinions about licensing, let me know. :)

Contributing

Disclaimer: I'm not interested in PRs or co-development at this early stage. The code is messy as heck, I've got a ton of weird ideas about where it's going, etc. If you really want to work on this, please contact me first.

That said, here are some instructions to develop or "install" the software:

Install GraalVM, version 22. (Higher versions may work.)
Install the necessary GraalVM plugins, like native-image.
Install Maven.
Run mvn package
Run ./sl myprogram.awk < input.txt > output.txt

FAQ

How do I pronounce `graawk` ?

Kind of like a pterodactyl would. Or, you can pronounce it like grok, which rhymes with AWK, at least in my dialect.

fwip/graawk