multilingual-nlp

There are 49 repositories under multilingual-nlp topic.

embeddings-benchmark/mteb
MTEB: Massive Text Embedding Benchmark
Language:Jupyter Notebook2.4k 19 936355
bigscience-workshop/xmtf
Crosslingual Generalization through Multitask Finetuning
Language:Jupyter Notebook529 6 2238
DmitryRyumin/EMNLP-2023-Papers
EMNLP 2023 Papers: Explore cutting-edge research from EMNLP 2023, the premier conference for advancing empirical methods in natural language processing. Stay updated on the latest in machine learning, deep learning, and natural language processing with code included. :star: support NLP!
Language:Python106 3 07
cisnlp/Glot500
Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages -- ACL 2023
Language:Python100 8 84
FSoft-AI4Code/TheVault
[EMNLP 2023] The Vault: A Comprehensive Multilingual Dataset for Advancing Code Understanding and Generation
Language:Jupyter Notebook92 4 69
shijie-wu/crosslingual-nlp
This repo supports various cross-lingual transfer learning & multilingual NLP models.
Language:Python92 7 26
epfl-dlab/llm-latent-language
Repo accompanying our paper "Do Llamas Work in English? On the Latent Language of Multilingual Transformers".
Language:Jupyter Notebook71 3 316
csebuetnlp/CrossSum
This repository contains the code, data, and models of the paper titled "CrossSum: Beyond English-Centric Cross-Lingual Summarization for 1,500+ Language Pairs" published in Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL’23), July 9-14, 2023.
Language:Python49 3 77
ceferisbarov/TUMLU
TUMLU: A Unified and Native Language Understanding Benchmark for Turkic Languages
Language:Python181
BatsResearch/cross-lingual-detox
Code for "Preference Tuning For Toxicity Mitigation Generalizes Across Languages." Paper accepted at Findings of EMNLP 2024
Language:Jupyter Notebook17 1 00
BatsResearch/LexC-Gen
Generate synthetic labeled data for extremely low-resource languages using bilingual lexicons.
Language:Python15 3 04
cambridgeltl/prompt4bli
On Bilingual Lexicon Induction with Large Language Models (EMNLP 2023). Keywords: Bilingual Lexicon Induction, Word Translation, Large Language Models, LLMs.
Language:Python10 6 02
cisnlp/MEXA
🔍 Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignment
Language:Python10 9 00
longxudou/multispider
MultiSpider: Towards Benchmarking Multilingual Text-to-SQL Semantic Parsing
Language:Python8 1 02
mobassir94/Multilingual-NLP-for-Islamic-Theology
Cross Lingual Language models for making search engines for Holy Quran and Sahih Hadiths
Language:Jupyter Notebook8 3 00
negar-foroutan/multiLMs-lang-neutral-subnets
[EMNLP 2022] Discovering Language-neutral Sub-networks in Multilingual Language Models.
Language:Python8 1 31
ramisa2108/Bangla-Complex-Named-Entity-Recognition-Challenge
Winning Solution for the Bangla Complex Named Entity Recognition Challenge - BDOSN NLP Hackathon 2023
Language:Jupyter Notebook7 2 00
MaLA-LM/mala-500
MaLA-500: Massive Language Adaptation of Large Language Models
Language:Python5 1 00
aditi184/MultilingualQA
Chaii (Challenge in AI for India) Multilingual QnA - Google Research India
Language:Jupyter Notebook4 1 00
negar-foroutan/multilingual-code-switched-reasoning
[EMNLP 2023 - Findings] Breaking the Language Barrier: Improving Cross-Lingual Reasoning with Structured Self-Attention
Language:Python4 3 02
cambridgeltl/sail-bli
Self-Augmented In-Context Learning for Unsupervised Word Translation (ACL 2024). Keywords: Bilingual Lexicon Induction, Word Translation, Large Language Models, LLMs.
Language:Python3 5 01
deokhk/CBP
Official Repository for Cross-lingual Back-Parsing: Utterance Synthesis from Meaning Representation for Zero-Resource Semantic Parsing (EMNLP 2024)
Language:Python3 1 10
e-hossam96/CMU-CS11-737
Solutions of the CMU Multilingual Natural Language Processing Course
Language:Shell3 3 01
thesofakillers/CLAfICLe
Official repository for the paper "CLAfICLe: Cross-Lingual Adaptation for In-Context Learning". Not Published.
Language:TeX3 2 00
ArkS0001/IIT-Bombay-Whisper-Hindi-ASR-Model-Machine-Learning-Intern
Whisper is an automatic speech recognition (ASR) system trained on 680,000 hours of multilingual and multitask supervised data collected from the web. We show that the use of such a large and diverse dataset leads to improved robustness to accents, background noise and technical language. Moreover, it enables transcription in multiple languages, as
Language:Jupyter Notebook2 1 03
harmonydata/harmony_r
R library for Harmony. R package - open source tool using AI for psychology and mental health. Actively recruiting contributors.
Language:HTML2 1 63
swaggy66/M-ABSA
M-ABSA: A Multilingual Dataset for Aspect-Based Sentiment Analysis
Language:Python21
tchewik/bilingualrsp
The official code and data for the ACL 2024 Findings paper "Bilingual Rhetorical Structure Parsing with Large Parallel Annotations".
Language:Python2 1 00
BatsResearch/LexC-Gen-Data-Archive
Data Repository for LexC-Gen: Generating Data for Extremely Low-Resource Languages with Large Language Models and Bilingual Lexicons
1 1 01
dkalpakchi/quinductor
A multilingual data-driven method for generating reading comprehension questions
Language:Jupyter Notebook1 1 10
faisaltareque/Multilingual-Rouge-Scorer
This Python package is used for calculating ROUGE scores and supports over 100 languages by utilizing a multilingual BPE tokenizer. It leverages the mBERT tokenizer and was developed to support our work XL-HeadTags.
Language:Python10
harmonydata/harmony_original
The Harmony project
Language:Jupyter Notebook1 1 261
Helsinki-NLP/lm-vs-mt
[EMNLP 2024] A Comparison of Language Modeling and Translation as Multilingual Pretraining Objectives
Language:Python1 2 00
Judy-Choi/NMT_Series
A collection of codes in a NMT series of Geultto 8th
Language:Jupyter Notebook1 2 00
joyou159/SWIZT
Exploring the use of multilingual transformers, specifically mBERT and XLM-RoBERTa, for named entity recognition (NER) in the context of Switzerland’s multi lingual environment.
Language:Jupyter Notebook00
swaggy66/MSMO
Multi-Scale and Multi-Objective Optimization for Cross-Lingual Aspect-Based Sentiment Analysis
Language:Python0 1 01

multilingual-nlp

embeddings-benchmark/mteb

bigscience-workshop/xmtf

DmitryRyumin/EMNLP-2023-Papers

cisnlp/Glot500

FSoft-AI4Code/TheVault

shijie-wu/crosslingual-nlp

epfl-dlab/llm-latent-language

csebuetnlp/CrossSum

ceferisbarov/TUMLU

BatsResearch/cross-lingual-detox

BatsResearch/LexC-Gen

cambridgeltl/prompt4bli

cisnlp/MEXA

longxudou/multispider

mobassir94/Multilingual-NLP-for-Islamic-Theology

negar-foroutan/multiLMs-lang-neutral-subnets

ramisa2108/Bangla-Complex-Named-Entity-Recognition-Challenge

MaLA-LM/mala-500

aditi184/MultilingualQA

negar-foroutan/multilingual-code-switched-reasoning

cambridgeltl/sail-bli

deokhk/CBP

e-hossam96/CMU-CS11-737

thesofakillers/CLAfICLe

ArkS0001/IIT-Bombay-Whisper-Hindi-ASR-Model-Machine-Learning-Intern

harmonydata/harmony_r

swaggy66/M-ABSA

tchewik/bilingualrsp

BatsResearch/LexC-Gen-Data-Archive

dkalpakchi/quinductor

faisaltareque/Multilingual-Rouge-Scorer

harmonydata/harmony_original

Helsinki-NLP/lm-vs-mt

Judy-Choi/NMT_Series

joyou159/SWIZT

swaggy66/MSMO