/in-geveb-corpus

Corpus of Yiddish based on literary articles published in "In Geveb"

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

in-geveb-corpus

Corpus of Yiddish based on https://ingeveb.org

TODO

  • [] Clean up yud-yud => tsvey-yudn ligatures

Directory structure

.
├── LICENSE
├── README.md
├── corpus
│   └── scraped articles and CSVs go here
├── data
│   └── other data files (eg. article links) go here
├── src
│   └── various scripts used to create the corpus go here