/one_to_all

passim text reuse data: one book compared to the entire corpus

Primary LanguagePython

This repository contains files containing text reuse data generated by passim and sorted by text version.

Each text version ("book 1") has two tsv files:

  • one in the "stats" folder, which contains a single row for each other text version ("book 2") in the corpus passim has detected text reuse with. Columns:
    • id: version ID (without language component and extension) of book 2
    • book: book URI of book 2
    • alignments: number of alignments with book 2
    • ch_match: number of characters in book 1 that are matched in book 2
  • one in the "msdata" folder, which contains a row for each text reuse alignment passim found for book 1 Columns:
    • ms1: milestone number in book 1
    • b1: character offset of the start of the alignment in book 1
    • e1: character offset of the end of the alignment in book 1
    • id2: version ID (without language component and extension) of book 2
    • ms2: milestone number in book 2
    • b2: character offset of the start of the alignment in book 2
    • e2: character offset of the end of the alignment in book 2
    • ch_match: number of characters in ms1 that are matched in ms2
    • matches_percent: percentage of characters in ms1 matched in ms2