/pmgawk

Primary LanguageCGNU General Public License v3.0GPL-3.0

README:

This is persistent gawk (pm-gawk). 

Persistent gawk integrates a new persistent memory allocator, pma, 
into the GNU AWK interpreter, gawk, to support persistent scripting. 

For the original GNU AWK readme file, check README.gawk  

INSTALLATION

To build persistent gawk, you need access to a recent compiler (gcc8 or later).

To build from source, configure with persistent scripting enabled:

$ ./configure --enable-persist

Once the build system is setup, persistent gawk is built using the
make command at the top level:

$ make

After successful compilation, run the test suite:

$ make check

There are two known tests that fail, namely fork and fork2. These 
test failures relate to the current incompatibility between fork and 
the file-backed memory-mapping approach to persistent memory and 
should not raise any concern.

GETTING STARTED

To use persistent gawk, first create for the persistent heap a
sparse file whose size is a multiple of the system page size.
For example, try the following on the command line: 

truncate -s 4096000000 heapfile

The media beneath the filesystem containing the backing file are
unconstrained, i.e., pma supports persistent memory programming 
on conventional hardware as well as NVM.

Pass this uninitialized backing file to the persistent gawk
interpreter via the “--persist” flag when invoking an AWK script; 
thereafter the file will contain a persistent heap holding
all script variables, which will be available to the script 
whenever it is invoked again with --persist. 

For example, to run the word-count example located in examples/wc:

$ truncate -s 4096000000 heapfile
$ gawk --persist=heapfile -f wc.awk input0.txt
$ gawk --persist=heapfile -f wc.awk input1.txt
$ gawk --persist=heapfile -f wc.awk input2.txt

The above processes one input per awk invocation and should give 
equivalent output to processing all inputs in a single invocation
through regular gawk:

$ gawk -f ./wc.awk input0.txt input1.txt input2.txt


REFERENCES

Zi Fan Tan, Jianan Li, Haris Volos, and Terence Kelly, "Persistent
Scripting," Non-Volatile Memory Workshop (NVMW) 2022.
http://nvmw.ucsd.edu/program/    [NVMW URLs are not stable, so
this one might change after the 2022 event is over]
See also "persistent gawk" (pm-gawk):
https://github.com/ucy-coast/pmgawk
https://coast.cs.ucy.ac.cy/projects/pmgawk/

Terence Kelly, Zi Fan Tan, Jianan Li, and Haris Volos, "Persistent
Memory Allocation," forthcoming in ACM _Queue_ magazine, Vol. 20
No. 2 (March/April 2022).  https://queue.acm.org/DrillBits7/