/sentence_server

output simple sentences to learn German

Primary LanguageJupyter Notebook

sentence_server

output simple sentences to learn German

  • idea is to get sentences from source such as project gutenberg
  • from having a corpus can find the most common words and possibly the simplest sentences
  • can then serve these sentences or construct vocabulary flashcard decks to learn from

Download DE texts from gutenberg:


mkdir books
cd books
wget -w 2 -m -H "http://www.gutenberg.org/robot/harvest?filetypes[]=txt&langs[]=de"

Then run extract_books.py to unzip books and insert them into a single folder

Issues:

  • books contains don't contain German entirely; some books have non German forewords but main text in German. Need to somehow get rid of these to have a clean text

spaCy installation (assumes anaconda with python3 installed)


conda config --add channels spacy
conda install spacy

#install german language model
sputnik --name spacy --repository-url http://index.spacy.io install de==1.0.0

Other notes / resources: