Pinned Repositories
Deep_Learning_in_LangTech_course
Materials for the University of Turku course TKO_8965 Deep Learning in Human Language Technology (previously named TKO_2101 Natural Language Processing)
FIN-bench
Evaluation of Finnish generative models
FinBERT
BERT model trained from scratch on Finnish
finngen-tools
Tools for training causal language models for Finnish
Finnish-dep-parser
The Finnish dependency parsing pipeline being developed by the Turku NLP group. Documentation:
Megatron-DeepSpeed
Ongoing research training transformer language models at scale, including: BERT & GPT-2
ocr-correction
Post-processing OCR errors with seq2seq models
Text_Mining_Course
Stuff for the Text Mining course
Turku-neural-parser-pipeline
A neural parsing pipeline for segmentation, morphological tagging, dependency parsing and lemmatization with pre-trained models for more than 50 languages. Top ranker in the CoNLL-18 Shared Task.
wikibert
BERT models for many languages created from Wikipedia texts
TurkuNLP Group - IT Department - University of Turku's Repositories
TurkuNLP/turku-one
Turku OntoNotes Entities Corpus (TurkuONE)
TurkuNLP/class-explainer
TurkuNLP/register-labeling
TurkuNLP/dolly-fi
Finnish version of databricks-dolly-15k instruction dataset
TurkuNLP/multilingual-register-labeling
Multilingual, multilabel modeling of registers
TurkuNLP/textual-data-analysis-course
TurkuNLP/Turku-paraphrase-corpus
TurkuNLP/CORE-corpus
TurkuNLP/ocr_errors_simulator
Functions and codes used to determine probabilities on OCR errors and simulate them
TurkuNLP/wikipedia-toxicity-data-fi
TurkuNLP/dep_search
TurkuNLP/oasst-fi
Open Assistant dataset translated to Finnish
TurkuNLP/registerlabeling
TurkuNLP/semantic-sim
TurkuNLP/sentiment-target-corpus
Targeted sentiment corpus
TurkuNLP/EccoBERT
TurkuNLP/squad2-fi
Repo for my little MT-of-SQUAD2 project
TurkuNLP/corefud-finnish-translation
TurkuNLP/FinCORE_full
TurkuNLP/finnish-tweets-lang-identification
Manually annotated language identification data for Finnish tweets (Finnish/non-Finnish).
TurkuNLP/finsquad
Experiments on fine-tuning a Finnish SQuAD2.0 model on a machine translated Finnish SQuAD2.0 dataset
TurkuNLP/gf_summerschool
TurkuNLP/instruct_qa
TurkuNLP/register-annotation-docs
Documentation for web register annotation
TurkuNLP/register-DeepL
Repository for everything regarding the translation of register data and training the models with the translated data
TurkuNLP/rel-mt-dataset
Code for the MT of Elisa Bassignana's relation dataset
TurkuNLP/running-trankit
TurkuNLP/speech_summarization
TurkuNLP/turku-layout-corpus
Corpus of Finnish open access publications with layout annotations
TurkuNLP/xlsum-fi
Finnish machine-translated version of the xlsum dataset. Not (yet) ready for use!