/stortingprosjekt

fetch text corpus from data.stortinget.no

Primary LanguageJupyter Notebook

stortingprosjekt

fetch text corpus from data.stortinget.no

This script focus on fetching a large corpus of text. The resulting dataset may be useful for training recurrent neural networks.

TODO

  • encode to UTF8
  • better cleaning of symbols/characters (currently about ~130 different)