Deep NLP @ CIS - LMU
Deep Natural Language Processing Group at Center for Language and Information Processing, University of Munich (LMU)
Munich, Germany
Pinned Repositories
Glot500
Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages -- ACL 2023
GlotCC
🕸 GlotCC Dataset and Pipline -- NeurIPS 2024
GlotLID
💬 Language Identification with Support for More Than 2000 Labels -- EMNLP 2023
GlotScript
🖋 Resource and Tool for Writing System Identification -- LREC 2024
GlotWeb
🕸 GlotWeb: Web Indexing for Low-Resource Languages -- under construction.
mPLM-Sim
mPLM-Sim: Better Cross-Lingual Similarity and Transfer in Multilingual Pretrained Language Models
ofa
A Framework aims to wisely initialize unseen subword embeddings in PLMs for efficient large-scale continued pretraining
parcoure
ParCourE - Parallel Corpus Explorer
semi-markov-crf
Code for paper "Neural Semi-Markov Conditional Random Fields for Robust Character-Based Part-of-Speech Tagging"
simalign
Obtain Word Alignments using Pretrained Language Models (e.g., mBERT)
Deep NLP @ CIS - LMU's Repositories
cisnlp/simalign
Obtain Word Alignments using Pretrained Language Models (e.g., mBERT)
cisnlp/GlotLID
💬 Language Identification with Support for More Than 2000 Labels -- EMNLP 2023
cisnlp/Glot500
Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages -- ACL 2023
cisnlp/GlotCC
🕸 GlotCC Dataset and Pipline -- NeurIPS 2024
cisnlp/ofa
A Framework aims to wisely initialize unseen subword embeddings in PLMs for efficient large-scale continued pretraining
cisnlp/GlotScript
🖋 Resource and Tool for Writing System Identification -- LREC 2024
cisnlp/GlotWeb
🕸 GlotWeb: Web Indexing for Low-Resource Languages -- under construction.
cisnlp/MEXA
🔍 Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignment
cisnlp/mPLM-Sim
mPLM-Sim: Better Cross-Lingual Similarity and Transfer in Multilingual Pretrained Language Models
cisnlp/Taxi1500
cisnlp/MaskLID
💬 MaskLID: Code-Switching Language Identification through Iterative Masking -- ACL 2024
cisnlp/TransliCo
TransliCo: A Contrastive Learning Framework to Address the Script Barrier in Multilingual Pretrained Language Models
cisnlp/relation-specific-neurons
On Relation-Specific Neurons in Large Language Models
cisnlp/ColexificationNet
Crosslingual Transfer Learning for Low-Resource Languages Based on Multilingual Colexification Graphs
cisnlp/manchu-in-context-mt
Understanding In-Context Machine Translation for Low-Resource Languages: A Case Study on Manchu
cisnlp/Transliteration-PPA
Breaking the Script Barrier in Multilingual Pre-Trained Language Models with Transliteration-Based Post-Training Alignment
cisnlp/TransMI
TransMI: A Framework to Create Strong Baselines from Multilingual Pretrained Language Models for Transliterated Data
cisnlp/cisnlp.github.io
Homepage of cisnlp
cisnlp/code-specific-neurons
💻🔍 How Programming Concepts and Neurons Are Shared in Code Language Models
cisnlp/GlotSparse
GlotSparse: Building Corpora in Under-Resourced Languages
cisnlp/GlotStoryBook
Children StoryBooks for 180 langauges.
cisnlp/ungoliant
:spider: The pipeline for the OSCAR/GlotCC corpus
cisnlp/2024fall-crosslingual-vlm-block-seminar
Materials of 2024 Fall cross-lingual visual language models block seminar at LMU Munich.
cisnlp/lohoravens-webpage
cisnlp/XAMPLER
XAMPLER: Learning to Retrieve Cross-Lingual In-Context Examples
cisnlp/LangSAMP
LangSAMP: Language-Script Aware Multilingual Pretraining
cisnlp/oscar-io
Readers/Writers for GlotCC/OSCAR corpus
cisnlp/oscar-tools
The original tooling for the GlotCC/OSCAR corpus rewritten in Rust
cisnlp/Spatial_Schemas
cisnlp/analogical_reasoning