ABA
Alignment-Based Approach for automatic modernization of french texts from the 16th to the 18th century
Install
Install Packages
- With make
make
- Without make
pip install -r requirements.txt
Generate Data
Download, Align and Analyze PARALLEL17
- Download PARALLEL17 and put it into the
download
folder or run script
python -m aba.download_git 'https://github.com/PhilippeGambette/PARALLEL17.git'
- Align PARALLEL17 by words
python -m aba.align_words
- Extract dictionaries from PARALLEL17
python -m aba.analyze
Extract Morphalou Dictionary
- Download Morphalou
- Copy
morphalou/4/Morphalou3.1_formatCSV_toutEnUn/Morphalou3.1_CSV.csv
todownload
folder - Run script
python -m aba.extract_dic_morphalou
Extract Wikisource Dictionary
Extract old french → modern french dictionary from Wikisource.
python -m aba.extract_dic_wikisource
Extract Name Dictionary
Extract dictionary from multiple .dic
files located in resources
folder.
python -m aba.extract_dic_resources
Main Scripts
Modernize Corpus
python -m aba.modernize_corpus
Modernize Text
Modernize a text in old french. 1
python -m aba.modernize [-h] [-n TEXT_NEW_PATH] text_old_path
Tools
Rules Chart
Opens a labeled dictionary and displays an interactive plotly
pie chart showing the frequence of modernization rules. A copy of the chart is saved in data/rules_chart.html
.
python -m aba.rules_chart
Find Strings
Search 2-columns .tsv
files in a given directory for two corresponding strings old
and new
.
Prints files, rows and lines where both strings appear.
python -m aba.find_strings [-h] [-d DIRECTORY] old new
Run Tests
py.test
Footnotes
-
Path arborescence must be written with forward slashes
/
. ↩