Comes from DraCor. It has been cleaned, especially regarding metadata.
All the data are already available in csv format (3grams are here and stopwords are here). But if needed you can recreate them.
- Extract the text and the metadata from the xml
bash Make_corpus.bash
- Install SuperStyl
- Create data with 3grams
python3 main.py -s ../txt/* -t chars -n 3
- Create data with stopwords
python3 main.py -s ../txt/* -t words -n 1 -f ../mots_outils.json
This research has been presented at DH Benelux 2023.
@inproceedings{cafiero:hal-04093598,
TITLE = {{Rise and Fall of Theatrical Genres in Early Modern France:}},
AUTHOR = {Cafiero, Florian and Gabay, Simon},
URL = {https://hal.science/hal-04093598},
BOOKTITLE = {{DH Benelux}},
ADDRESS = {Bruxelles, Belgium},
YEAR = {2023},
MONTH = May,
HAL_ID = {hal-04093598},
HAL_VERSION = {v1},
}