This repository contains files containing text reuse data generated by passim and sorted by text version.
Each text version ("book 1") has two tsv files:
- one in the "stats" folder, which contains a single row for each other text
version ("book 2") in the corpus passim has detected text reuse with.
Columns:
- id: version ID (without language component and extension) of book 2
- book: book URI of book 2
- alignments: number of alignments with book 2
- ch_match: number of characters in book 1 that are matched in book 2
- one in the "msdata" folder, which contains a row for each text reuse
alignment passim found for book 1
Columns:
- ms1: milestone number in book 1
- b1: character offset of the start of the alignment in book 1
- e1: character offset of the end of the alignment in book 1
- id2: version ID (without language component and extension) of book 2
- ms2: milestone number in book 2
- b2: character offset of the start of the alignment in book 2
- e2: character offset of the end of the alignment in book 2
- ch_match: number of characters in ms1 that are matched in ms2
- matches_percent: percentage of characters in ms1 matched in ms2