SemGuS Parser

A C# parsing library for SemGuS problems. We also provide a standalone tool for verifying and converting SemGuS files. Find the latest release and binaries on the Releases page.

Features

Parses SemGuS problems from SMT-LIB2 source files
Displays constrained Horn clauses (CHCs) and other problem data in a human-inspectable format
Converts SMT-LIB2 SemGuS problems into JSON for consumption by other tools
Allows programatic access via a C# API

Examples

To verify the syntax of a problem file and display details in a human-inspectable format, run:

semgus-parser <input.sem>

To convert a problem file to JSON, run:

semgus-parser --format json --mode batch --output <output.json> -- <input.sem>

Some examples of SemGuS files can be found in the IntegrationTests/data directory.

Caveats and Considerations

This project is under active development and does not yet support all SMT-LIB2 features. The following theories are currently supported:

Core
Ints
Strings (with partial support for regular expressions)
Bit vectors (partial support; only theory functions supported for now)
Datatypes (non-parametric only)
Arrays (partial support)

Other features:

Converting SyGuS problems to CHCs on the fly (beta)
Declarative S-expression output format (beta). Use --format sexpr

New additions:

Datatypes (non-parametric only)
Auxiliary functions
Arbitrary match expressions

Unsupported SMT-LIB2 features include:

Theory functions annotated with :left-assoc, :right-assoc, :chainable, and :pairwise. Certain Core functions are implemented, so post an issue if others are needed.
Some terms, including let
Uninterpreted sorts (declare-sort) and sort aliases (define-sort)

The roadmap for next-up features includes:

Arbitrary let terms

If there is an unsupported feature that you would like added, drop us a line by submitting an issue (or commenting on an existing one). This will help us prioritize what to put on our roadmap.

Installation

To install the stand-alone parsing tool, find the binary for your operating system on the Releases page. Unzip it and put the semgus-parser (or semgus-parser.exe) in a convenient location. No other dependencies are required. You may have to mark it executable on Linux and macOS; just run chmod a+x semgus-parser in the folder with the binary.

If you have the .NET 6 SDK installed, the parsing tool can also be installed automatically through NuGet:

dotnet tool install --global Semgus.Parser.Tool

The C# parsing library is available in NuGet. Look for the Semgus.Parser package.

Usage

semgus-parser [--format <format>] [--mode <mode>] [--output <filename>] -- <input.sem> ...

Passing - as the input filename (or not supplying any filenames) makes the tool read from standard input.

Format Verification

Run the tool with a format of verify to perform a syntax verification. In addition to syntax verification, this format prints out the problem information, which can be checked for correctness. This output format is not designed to be machine readable; use the JSON output format instead.

JSON Converter

The SemgusParser project contains a utility that reads in a SemGuS file and produces JSON data representing the problem, usable by other non-.NET tools that cannot directly link with this library.

Usage

semgus-parser --format json [--mode stream|batch] [--output <filename.json>] -- <input.sem> ...

Passing - as the input filename (or not supplying any filenames) makes the tool read from standard input.

The two modes, stream and batch, alter the output format of this tool. In stream mode (the default), the output is a stream of newline-delimited JSON objects representing parsing events. This is suitable for interactive mode, or when the output of this tool is being directly piped to the consuming process. In batch mode, the output is a JSON array, with each item in the array being a parsing event object. This format is suitable for outputting to a file as a complete JSON datatype. Note that the event object format is identical between these two modes; only the output structure differs.

In stream mode:

{ ...event 1... }
{ ...event 2... }
{ ...event 3... }
.
.
.

In batch mode:

[
  { ...event 1... },
  { ...event 2... },
  { ...event 3... },
  .
  .
  .
]

Event object format

Each event object has two fields that denote what type of event it represents: $event and $type. The $event field holds the name of the particular event. The $type field denotes the general type of event: currently, either

semgus, for a SemGuS-specific event
smt, for a general SMT-LIB2 event
meta, for metadata about the problem

`meta` events

`set-info`

Holds general metadata about the problem. The <value> is either a string, number, or list of values (which might be lists, strings, numbers, etc.).

{ "$event": "set-info", "$type": "meta", "keyword": "<key>", "value": <value> }

`end-of-stream`

Sent at the end of the event stream.

{ "$event": "end-of-stream", "$type": "meta" }

`semgus` events

`check-synth`

Signals that the synthesizer should perform synthesis with the information provided in previous events.

{ "$event": "check-synth", "$type": "semgus" }

`declare-term-type`

Declares that a particular symbol will be used as a term type. Does not contain the actual term type definition.

{ "$event": "declare-term-type", "$type": "semgus", "name": "<term-type-name>" }

`define-term-type`

Defines a set of constructors for a term type. All term types referenced will have been previously declared by a declare-term-type event.

{ "$event": "define-term-type", "$type": "semgus", "name": "<term-type-name>", "constructors": [ <constructors> ] }

Each constructor (a.k.a. operator) has the form:

{ "name": "<constructor-name>", "children": ["<term-type>", ...] }

Example:

{
  "name": "E",
  "constructors": [
    {
      "name": "$x", "children": []
    },
    {
      "name": "$y", "children": []
    },
    {
      "name": "$0", "children": []
    },
    {
      "name": "$1", "children": []
    },
    {
      "name": "$+", "children": ["E", "E"]
    },
    {
      "name": "$ite", "children": ["B", "E", "E"]
    }
  ],
  "$event": "define-term-type",
  "$type": "semgus"
}

`chc`

An individual constrained Horn clause generated by the problem.

{"head": <head-rel>, "bodyRelations":[<body-rels>...], "symbols":<symbol-table>, "constraint": <term>, "constructor": <constructor>,"$event":"chc","$type":"semgus"}

Where each semantic relation has the form:

{"name":"E.Sem","signature":["E","Int","Int","Int"],"arguments":["et","x","y","r"]}

The constructor has the form:

{"name":"<name>","arguments":["<var>"...],"argumentSorts":["<sort>"...],"returnSort":"<sort>"}

Terms have the general form:

{ "$termType": "<type>", ... }

Term types include application, exists, forall, and variable. Literals are simply a literal string or number. An example term:

{"name":"=","returnSort":"Bool","argumentSorts":["Int","Int"],"arguments":[{"name":"r","sort":"Int","$termType":"variable"},{"name":"y","sort":"Int","$termType":"variable"}],"$termType":"application"}

Symbol Table

The symbol table (denoted <symbol-table> above) contains information about what symbols appear in the top-level bindings of the CHC head and body. Each symbol entry contains the symbol identifier, sort, and an optional index:

{"id": <id>, "sort": <sort>, "index": <index>}

The symbol table object has six properies, all but one (term) an array of symbol entries:

inputs: symbols that are annotated as CHC input variables. The index is the zero-based offset into the head signature.
outputs: symbols that are annotated as CHC output variables. The index is the zero-based offset into the head signature.
term: a single symbol entry for the CHC term variable. The index is the zero-based offset into the head signature.
children: symbols that are bound as child terms of the main CHC term. The index is the zero-based offset of the child in the term constructor.
auxiliary: symbols that are bound at the top level of the CHC body. The index is unspecified.
unclassified: symbols that appear in the CHC head but are not the term variable or annotated as an input or output. The index is the zero-based offset into the head signature.

Of note, inputs, outputs, term, and unclassified are bound outside of children, and auxiliary inside children. This means that a variable with the same identifier that appears in, e.g., auxiliary, will shadow a same-named variable in, e.g., inputs or children.

Note that the CHC event may contain the following deprecated properties for symbol information, which should be ignored:

inputVariables
outputVariables
variables

These properties will be removed in a future release.

Example:

{
    "head":
    {
        "name":"E.Sem",
        "signature":["E","Int","Int","Int"],
        "arguments":["et","x","y","r"]
    },
    "bodyRelations":
    [
        {
            "name":"E.Sem",
            "signature":["E","Int","Int","Int"],
            "arguments":["et1","x","y","r1"]
        },
        {
            "name":"E.Sem",
            "signature":["E","Int","Int","Int"],
            "arguments":["et2","x","y","r2"]
        }
    ],
    "inputVariables":["x","y"],
    "outputVariables":["r"],
    "variables":["et","x","y","r","r1","r2"],
    "constraint":
    {
        "name":"=",
        "returnSort":"Bool",
        "argumentSorts":["Int","Int"],
        "arguments":
        [
            {
                "name":"r",
                "sort":"Int",
                "$termType":"variable"
            },
            {
                "name":"+",
                "returnSort":"Int",
                "argumentSorts":["Int","Int"],
                "arguments":
                [
                    {"name":"r1","sort":"Int","$termType":"variable"},
                    {"name":"r2","sort":"Int","$termType":"variable"}
                ],
                "$termType":"application"
            }
        ],
        "$termType":"application"
    },
    "constructor":
    {
        "name":"$+",
        "arguments":["et1","et2"],
        "argumentSorts":["E","E"],
        "returnSort":"E"
    },
    "symbols":
    {
        "inputs":
        [
            {"id":"x","sort":"Int","index":1},
            {"id":"y","sort":"Int","index":2}
        ],
        "outputs":
        [
            {"id":"r","sort":"Int","index":3}
        ],
        "term":{"id":"et","sort":"E","index":0},
        "unclassified":[],
        "auxiliary":
        [
            {"id":"r1","sort":"Int","index":null},
            {"id":"r2","sort":"Int","index":null}
        ],
        "children":
        [
            {"id":"et1","sort":"E","index":0},
            {"id":"et2","sort":"E","index":1}
        ]
    },
    "$event":"chc",
    "$type":"semgus"
}

`constraint`

A Boolean predicate term.

{ "$event": "constraint", "$type": "semgus", "constraint": <term> }

The term is defined as above.

`synth-fun`

Contains the name of the function to synthesize, as well as the concrete grammar being used.

{ "$event": "synth-fun", "$type": "semgus", "grammar": <grammar>, "name": "<name>", "termType": "<name>" }

The grammar object is defined as:

{ "nonTerminals": [ {"name": "<name>", "termType": "<name>"}, ...], "productions": [<production>...] }

Each production object is:

{ "instance": "<nt-name>", "operator": "<op-name>", "occurrences": ["<nt-name>"] }

Note that an occurrence will be null if it is not a non-terminal in the operator constructor, i.e. a discovered constant.

`smt` events

`declare-function`

A function declaration, including a name and signature. The following declares a function foo with no arguments that returns a string:

{"$event":"declare-function","$type":"smt","name":"foo","rank":{"argumentSorts":[],"returnSort":"String"}}

`define-function`

A function definition, including a name, signature, and lambda term implementing the function body. Note that a declare-function event will always precede the define-function event (but not necessarily immediately). The following defines the foo function to return a constant string "Foo":

{"$event":"define-function","$type":"smt","name":"foo","rank":{"argumentSorts":[],"returnSort":"String"},"definition":{"arguments":[],"body":"Foo","$termType":"lambda"}}

`declare-datatype`

A datatype declaration, including a type name and arity (currently always zero). Constructors are not included. The following declares a datatype Pair:

{"$event":"declare-datatype","$type":"smt","name":"Pair","arity":0}

`define-datatype`

A datatype definition, that is, the constructors associated with the datatype. The following defines two constructors for Pair, a pair constructor and a leaf constructor:

{"$event":"define-datatype","$type":"smt","name":"Pair","constructors":[{"name":"pair","children":["Pair","Pair"]},{"name":"leaf","children":["String"]}]}

SemGuS-git/Semgus-Parser

SemGuS Parser

Features

Examples

Caveats and Considerations

Installation

Usage

Format Verification

JSON Converter

Usage

Event object format

meta events

set-info

end-of-stream

semgus events

check-synth

declare-term-type

define-term-type

chc

Symbol Table

Example:

constraint

synth-fun

smt events

declare-function

define-function

declare-datatype

define-datatype

`meta` events

`set-info`

`end-of-stream`

`semgus` events

`check-synth`

`declare-term-type`

`define-term-type`

`chc`

`constraint`

`synth-fun`

`smt` events

`declare-function`

`define-function`

`declare-datatype`

`define-datatype`