/smatch

Smatch tool: evaluation of AMR semantic structures

Primary LanguagePythonMIT LicenseMIT

Smatch (semantic match) tool

Version on PyPI Python Support

Smatch is an evaluation tool for AMR (Abstract Meaning Representation). It computes the Smatch score (defined below) of two AMR graphs in terms of their matching triples (edges) by finding a variable (node) mapping that maximizes the count, M, of matching triples, then:

  • M is the number of matching triples
  • T is the total number of triples in the first AMR
  • G is the total number of triples in the second AMR
  • Precision is defined as P = M/T
  • Recall is defined as R = M/G
  • The Smatch score is the F-score: F = 2 * (P*R)/(P+R)

For more information, see Cai and Knight, 2013.

Requirements, Installation, and Usage

This Smatch implementation is tested for Python 3.5 or higher. It is released on PyPI so you can install it with pip:

$ pip install smatch

You can also clone this repository and run the smatch.py script directly as it does not need to be installed to be used.

To use the script, run it with at least the -f option, which takes two filename arguments:

$ smatch.py -f test.amr gold.amr

Note that the order of these arguments does not matter for the Smatch score as the F-score is symmetric, but swapping the arguments will swap the precision and recall. The files contain AMRs separated by a blank line, with comment lines starting with # (see test_input1.txt for an example).

For other options, try smatch.py --help.

Citation

@inproceedings{cai-knight-2013-smatch,
    title = "{S}match: an Evaluation Metric for Semantic Feature Structures",
    author = "Cai, Shu and Knight, Kevin",
    booktitle = "Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)",
    month = aug,
    year = "2013",
    address = "Sofia, Bulgaria",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/P13-2131",
    pages = "748--752",
}

Recommendations for Reproducible Research

You can help make your research reproducible by including the following information in your writing:

  • The software version (e.g., repository URL and version number)
  • The number of restarts (-r) used, even if unchanged from the default
  • The order of the arguments to -f (if reporting precision and recall)
  • Any other options or preprocessing steps

History

The code was mostly developed during 2012 and 2013, and has undergone many fixes and updates. Note that the versions distributed for SemEval-2016 were numbered 2.0–2.0.2, but these predate this repository and the 1.0 series on PyPI. For more details, see the Changelog.

Related Projects

Here are some notable forks of Smatch:

And here are other evaluation metrics for AMR:

  • mdtux89/amr-evaluation offers a set of metrics based on Smatch for fine-grained evaluation
  • freesunshine0316/sembleu is inspired by BLEU and puts more weight on "content" than graph-structure similarity
  • rafaelanchieta/sema weights error types differently and does not consider which node is the graph's top
  • flipz357/amr-metric-suite Soft Smatch (S^2match): computes Smatch alignment and Smatch graph similarity with graded concept match via Glove embeddings (e.g., enemy - foe).