MultiCite (NAACL 2022)

Modeling realistic citations requires moving beyond the single-sentence single-label setting.

Full Dataset

The full dataset can be found at data/full-v20210918.json. It has the following schema:

  <paper_id>: {
    "x": [                  # contains full paper text split by sentences
        "sent_id": <str>,   # globally unique identifier for this sentence
        "text": <str>       # citation mentions of target paper are wrapped in <span ..."></span> tags
    "y": {
      <intent_id>: {
        "gold_contexts": [
            <List[str]>,    # list of sent_ids annotated as a context for this intent
            <List[str]>,    # different list of sent_ids annotated as a context for this intent
        "cite_sentences": <List[str]>   # all sent_ids containing citation mention of target paper
      <intent_id>: {},
      <intent_id>: {},
  <paper_id>: {},
  <paper_id>: {},

Intent IDs

    '@BACK@': 'Background',
    '@MOT@': 'Motivation',
    '@FUT@': 'Future Work',
    '@SIM@': 'Similar',
    '@DIF@': 'Difference',
    '@USE@': 'Uses',
    '@EXT@': 'Extention',
    '@UNSURE@': 'Unsure'

Version for Classification Experiments

For the classification experiments described in the manuscript, we extracted the citations contexts from the full texts. The versions we have used are provided in ./data/classification_<context_size>_context, where <context_size> reflects the size of the context extracted around the citation mention. For instance, in ./data/classification_1_context, the text to classify is always the sentence mentioning the cited work only, while in ./data/classification_gold_context the text to classify corresponds to the annotated gold context.

The structure is the following:

      "id": <instance_id>,
      "x": <text>,
      "y": <label>

Version for Q&A Experiments

Also for the Q&A experiments described in the manuscript, we needed to transform the data. The versions we have used are provided in ./data/classification_<context_size>_context, where <context_size> reflects the size of the context extracted around the citation mention. For instance, in ./data/classification_1_context, the text to classify is always the sentence mentioning the cited work only, while in ./data/classification_gold_context the text to classify corresponds to the annotated gold context.

The code we used to convert the data is available in ./qa/convert_ours_to_qa.py.

The structure is the following:

        <paper_id>: {
            "title": <paper_id>,
                "abstract": "",
                "full_text": [
                        "section_name": "",
                        "paragraphs": [
               "qas": [
                           "question": "Does the paper cite <cited_work> for background information?",
                           "question_id": "ABC_8f0aab7fd30ffc56cc477b25e6bb16_00",
                           "answers": [
                                 "answer": {
                                    "unanswerable": false,
                                    "extracted_spans": [],
                                    "yes_no": <label-true/false>,
                                    "free_form_answer": "",
                                    "evidence": [
                                    "highlighted_evidence": []
                           "question": "Does the paper cite <cited_work> as motivation?",
                           "question_id": "ABC_8f0aab7fd30ffc56cc477b25e6bb16_02",
                           "answers": [
                                 "answer": {
                                    "unanswerable": false,
                                    "extracted_spans": [],
                                    "yes_no": <label-true/false>,
                                    "free_form_answer": "",
                                    "evidence": [
                                    "highlighted_evidence": []
                        }, ...
    }, ...


The model checkpoints trained are available on Huggingface:

  1. Multi-label Citation Intent Classification

    https://huggingface.co/allenai/multicite-multilabel-scibert https://huggingface.co/allenai/multicite-multilabel-roberta-large

  2. Citation Context Identification


  3. Paper-Level Citation Intent Q&A




The scripts needed for running the multi-label classification experiments can be found in ./classification. An example call is provided in ./classification/run_classify_multilabel.sh.

Paper-level Citation Intent Q&A

For running these experiments, we used the original code from Dasigi et al., 2021. A script converting our data set to the Qasper Q&A format is ./qa/convert_ours_to_qa.py. Baseline code is available in ./qa/eval_qa.py.


MultiCite is released under the CC BY-NC 2.0 as it is derived on top of S2ORC. By using MultiCite, you are agreeing to its usage terms.


If using this dataset, please cite:

    title={{M}ulti{C}ite: {M}odeling realistic citations requires moving beyond the single-sentence single-label setting},
    author={Anne Lauscher and Brandon Ko and Bailey Kuehl and Sophie Johnson and David Jurgens and Arman Cohan and Kyle Lo},