Pinned Repositories
Deep_Learning_in_LangTech_course
Materials for the University of Turku course TKO_8965 Deep Learning in Human Language Technology (previously named TKO_2101 Natural Language Processing)
FIN-bench
Evaluation of Finnish generative models
FinBERT
BERT model trained from scratch on Finnish
finngen-tools
Tools for training causal language models for Finnish
Finnish-dep-parser
The Finnish dependency parsing pipeline being developed by the Turku NLP group. Documentation:
Megatron-DeepSpeed
Ongoing research training transformer language models at scale, including: BERT & GPT-2
ocr-correction
Post-processing OCR errors with seq2seq models
Text_Mining_Course
Stuff for the Text Mining course
Turku-neural-parser-pipeline
A neural parsing pipeline for segmentation, morphological tagging, dependency parsing and lemmatization with pre-trained models for more than 50 languages. Top ranker in the CoNLL-18 Shared Task.
wikibert
BERT models for many languages created from Wikipedia texts
TurkuNLP Group - IT Department - University of Turku's Repositories
TurkuNLP/Turku-neural-parser-pipeline
A neural parsing pipeline for segmentation, morphological tagging, dependency parsing and lemmatization with pre-trained models for more than 50 languages. Top ranker in the CoNLL-18 Shared Task.
TurkuNLP/Deep_Learning_in_LangTech_course
Materials for the University of Turku course TKO_8965 Deep Learning in Human Language Technology (previously named TKO_2101 Natural Language Processing)
TurkuNLP/intro-to-nlp
Introduction to Natural Language Processing
TurkuNLP/ATP_kurssi
TurkuNLP/DIKI1002-Working-with-Text-in-Python
TurkuNLP/lumi-nlp-recipes
TurkuNLP/RAG-web-app
TurkuNLP/DIKI1003-NLP-for-Linguists
TurkuNLP/FinParl-emotion
Emotion analysis for Finnish parliamentary speeches
TurkuNLP/ocr_errors_simulator
Functions and codes used to determine probabilities on OCR errors and simulate them
TurkuNLP/toxicity-classifier
Repository for all things related to classifying whether a text is toxic or not using data from https://github.com/TurkuNLP/wikipedia-toxicity-data-fi
TurkuNLP/turkunlp.github.io
A Jekyll version of the "Editorial" theme by HTML5 UP.
TurkuNLP/Karelia-Project
TurkuNLP/pytorch-registerlabeling
TurkuNLP/ecco-ocr-ec
TurkuNLP/list-of-publications
Turku NLP list of publications
TurkuNLP/register-qa
TurkuNLP/finnish-instructions
Centralized repo for Finnish instruction data
TurkuNLP/htr-annotations
Handwritten text recognition annotations
TurkuNLP/htr-table-pipeline
Handwritten text recognition pipeline for table data
TurkuNLP/Keyword-embeddings-clusters
Clusters with keywords grouped based on their word embeddings
TurkuNLP/LLM_document_descriptors
TurkuNLP/multilingual-CORE
TurkuNLP/ocr-postcorrection-lm
Code to try out ocr postcorrection with language models
TurkuNLP/Open-Assistant
Data collection for Finnish language using OpenAssistant-platform
TurkuNLP/overfit-gpt
TurkuNLP/ParliamentSpeechClassifier
TurkuNLP/situational-analysis-llm
Code and data for multilingual situational analysis of web registers using LLMs.
TurkuNLP/TurCORE
Turkish Corpus of Online REgisters (TurCORE)
TurkuNLP/vLLM-recipes
Different vLLM setups on different machines