/ProGraML

A Graph-based Program Representation for Data Flow Analysis and Compiler Optimizations

Primary LanguageC++OtherNOASSERTION

ProGraML: Program Graphs for Machine Learning

PyPI version PyPi Downloads License

An expressive, language-independent representation of programs.

Check the website for more information.

Introduction

ProGraML is a representation for programs as input to a machine learning model. The key features are:

  1. Simple: Everything is available through a pip install, no compilation required. Supports several programming languages (C, C++, LLVM-IR, XLA) and several graph formats (NetworkX, DGL, Graphviz, JSON) out of the box.

  2. Expressive: Captures every control, data, and call relation across entire programs. The representation is independent of the source language. Features and labels can be added at any granularity to support whole-program, per-instruction, or per-relation reasoning tasks.

  3. Fast: The core graph construction is implemented in C++ with a low overhead interface to Python. Every API method supports simple and efficient parallelization through an executor parameter.

To get stuck in and play around with our graph representation, visit:

Or if papers are more your ☕, have a read of ours:

Supported Programming Languages

The following programming languages and compiler IRs are supported out-of-the-box:

Language API Calls Supported Versions
C programl.from_cpp(), programl.from_clang() Up to ISO C 2017
C++ programl.from_cpp(), programl.from_clang() Up to ISO C++ 2020 DIS
LLVM-IR programl.from_llvm_ir() 3.8.0, 6.0.0, 10.0.0
XLA programl.from_xla_hlo_proto() 2.0.0

Is your favorite language not supported here? Submit a feature request!

Getting Started

Install the latest release of the Python package using:

pip install -U programl

The API is very simple, comprising graph creation ops, graph transform ops, and graph serialization ops. Here is a quick demo of each:

>>> import programl as pg

# Construct a program graph from C++:
>>> G = pg.from_cpp("""
... #include <iostream>
...
... int main(int argc, char** argv) {
...   std::cout << "Hello, world!" << std::endl;
...   return 0;
... }
... """)

# A program graph is a protocol buffer:
>>> type(G).__name__
'ProgramGraph'

# Convert the graph to NetworkX:
>>> pg.to_networkx(G)
<networkx.classes.multidigraph.MultiDiGraph at 0x7fbcf40a2fa0>

# Save the graph for later:
>>> pg.save_graphs('file.data', [G])

For further details check out the API reference.

Contributing

Patches, bug reports, feature requests are welcome! Please use the issue tracker to file a bug report or question. If you would like to help out with the code, please read this document.

Citation

If you use ProGraML in any of your work, please cite this paper:

@inproceedings{cummins2021a,
  title={{ProGraML: A Graph-based Program Representation for Data Flow Analysis and Compiler Optimizations}},
  author={Cummins, Chris and Fisches, Zacharias and Ben-Nun, Tal and Hoefler, Torsten and O'Boyle, Michael and Leather, Hugh},
  booktitle = {Thirty-eighth International Conference on Machine Learning (ICML)},
  year={2021}
}