Issues
- 0
Use only riksdagen html and clean
#36 opened by dpriskorn - 1
Support all Runeberg CC0 books also
#34 opened by dpriskorn - 0
Use Metabase to visualize the data
#33 opened by dpriskorn - 0
Upload all sentences and raw tokens to a Wikibase to enable linking and annotation
#32 opened by dpriskorn - 0
Store boolean on datasets whether cc0
#31 opened by dpriskorn - 0
Fetch html instead of pdf from folketinget
#30 opened by dpriskorn - 0
- 0
Support storing year per document
#28 opened by dpriskorn - 0
- 0
Add litteraturbanken 344M token cc-by 4
#27 opened by dpriskorn - 0
- 0
Strip entities before insertion
#25 opened by dpriskorn - 0
- 0
Document the evolvable API and help consumers
#23 opened by dpriskorn - 0
Move Riksdagen specific code into /providers
#22 opened by dpriskorn - 0
- 0
Add new endpoint /sv/usage_example/search/$1
#18 opened by dpriskorn - 0
Store information about license on datasets
#17 opened by dpriskorn - 0
- 0
- 0
Store unique NER entities per sentence
#13 opened by dpriskorn - 0
- 0
More cleaning of tokens is needed
#9 opened by dpriskorn - 0
Markup hyphenated tokens
#8 opened by dpriskorn - 1
Support entity linking for each sentence
#4 opened by dpriskorn - 1
Switch to using swedish-spacy-pipeline
#3 opened by dpriskorn - 0
Switch to fasttext-langdetect
#1 opened by dpriskorn