Deep NLP @ CIS - LMU
Deep Natural Language Processing Group at Center for Language and Information Processing, University of Munich (LMU)
Munich, Germany
Pinned Repositories
bias-in-nlp
Literature overview: gender bias in natural language processing
Glot500
Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages -- ACL 2023
GlotCC
GlotCC: An Open Broad-Coverage CommonCrawl Corpus and Pipeline for Minority Languages -- under review
GlotLID
GlotLID: Language Identification with Support for More Than 2000 Labels -- EMNLP 2023
GlotScript
GlotScript: A Resource and Tool for Low Resource Writing System Identification -- LREC 2024
mPLM-Sim
mPLM-Sim: Better Cross-Lingual Similarity and Transfer in Multilingual Pretrained Language Models
ofa
A Framework aims to wisely initialize unseen subword embeddings in PLMs for efficient large-scale continued pretraining
parcoure
ParCourE - Parallel Corpus Explorer
semi-markov-crf
Code for paper "Neural Semi-Markov Conditional Random Fields for Robust Character-Based Part-of-Speech Tagging"
simalign
Obtain Word Alignments using Pretrained Language Models (e.g., mBERT)
Deep NLP @ CIS - LMU's Repositories
cisnlp/simalign
Obtain Word Alignments using Pretrained Language Models (e.g., mBERT)
cisnlp/Glot500
Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages -- ACL 2023
cisnlp/GlotLID
GlotLID: Language Identification with Support for More Than 2000 Labels -- EMNLP 2023
cisnlp/semi-markov-crf
Code for paper "Neural Semi-Markov Conditional Random Fields for Robust Character-Based Part-of-Speech Tagging"
cisnlp/GlotScript
GlotScript: A Resource and Tool for Low Resource Writing System Identification -- LREC 2024
cisnlp/parcoure
ParCourE - Parallel Corpus Explorer
cisnlp/GlotCC
GlotCC: An Open Broad-Coverage CommonCrawl Corpus and Pipeline for Minority Languages -- under review
cisnlp/ofa
A Framework aims to wisely initialize unseen subword embeddings in PLMs for efficient large-scale continued pretraining
cisnlp/mPLM-Sim
mPLM-Sim: Better Cross-Lingual Similarity and Transfer in Multilingual Pretrained Language Models
cisnlp/bias-in-nlp
Literature overview: gender bias in natural language processing
cisnlp/graph-align
code for EMNLP graph align paper
cisnlp/Taxi1500
cisnlp/GlotWeb
GlotWeb: Web Indexing for Low-Resource Languages -- under construction.
cisnlp/TransliCo
TransliCo: A Contrastive Learning Framework to Address the Script Barrier in Multilingual Pretrained Language Models
cisnlp/TransMI
TransMI: A Framework to Create Strong Baselines from Multilingual Pretrained Language Models for Transliterated Data
cisnlp/cisnlp.github.io
Homepage of cisnlp
cisnlp/ColexificationNet
Crosslingual Transfer Learning for Low-Resource Languages Based on Multilingual Colexification Graphs
cisnlp/GlotSparse
GlotSparse: Building Corpora in Under-Resourced Languages
cisnlp/GlotStoryBook
Children StoryBooks for 180 langauges.
cisnlp/MaskLID
MaskLID: Code-Switching Language Identification through Iterative Masking -- ACL 2024
cisnlp/lohoravens-webpage
cisnlp/Transliteration-PPA
Breaking the Script Barrier in Multilingual Pre-Trained Language Models with Transliteration-Based Post-Training Alignment
cisnlp/XAMPLER
XAMPLER: Learning to Retrieve Cross-Lingual In-Context Examples
cisnlp/Spatial_Schemas
cisnlp/analogical_reasoning