multilingual-nlp

There are 43 repositories under multilingual-nlp topic.

  • embeddings-benchmark/mteb

    MTEB: Massive Text Embedding Benchmark

    Language:Jupyter Notebook2k15553289
  • bigscience-workshop/xmtf

    Crosslingual Generalization through Multitask Finetuning

    Language:Jupyter Notebook51862238
  • DmitryRyumin/EMNLP-2023-Papers

    EMNLP 2023 Papers: Explore cutting-edge research from EMNLP 2023, the premier conference for advancing empirical methods in natural language processing. Stay updated on the latest in machine learning, deep learning, and natural language processing with code included. :star: support NLP!

    Language:Python103207
  • cisnlp/Glot500

    Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages -- ACL 2023

    Language:Python99883
  • shijie-wu/crosslingual-nlp

    This repo supports various cross-lingual transfer learning & multilingual NLP models.

    Language:Python92727
  • FSoft-AI4Code/TheVault

    [EMNLP 2023] The Vault: A Comprehensive Multilingual Dataset for Advancing Code Understanding and Generation

    Language:Jupyter Notebook85469
  • epfl-dlab/llm-latent-language

    Repo accompanying our paper "Do Llamas Work in English? On the Latent Language of Multilingual Transformers".

    Language:Jupyter Notebook603311
  • csebuetnlp/CrossSum

    This repository contains the code, data, and models of the paper titled "CrossSum: Beyond English-Centric Cross-Lingual Summarization for 1,500+ Language Pairs" published in Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL’23), July 9-14, 2023.

    Language:Python49477
  • BatsResearch/cross-lingual-detox

    Code for "Preference Tuning For Toxicity Mitigation Generalizes Across Languages." Paper accepted at Findings of EMNLP 2024

    Language:Jupyter Notebook16100
  • BatsResearch/LexC-Gen

    Generate synthetic labeled data for extremely low-resource languages using bilingual lexicons.

    Language:Python14304
  • cambridgeltl/prompt4bli

    On Bilingual Lexicon Induction with Large Language Models (EMNLP 2023). Keywords: Bilingual Lexicon Induction, Word Translation, Large Language Models, LLMs.

    Language:Python9702
  • cisnlp/MEXA

    Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignment

    Language:Python8900
  • mobassir94/Multilingual-NLP-for-Islamic-Theology

    Cross Lingual Language models for making search engines for Holy Quran and Sahih Hadiths

    Language:Jupyter Notebook8300
  • negar-foroutan/multiLMs-lang-neutral-subnets

    [EMNLP 2022] Discovering Language-neutral Sub-networks in Multilingual Language Models.

    Language:Python8131
  • longxudou/multispider

    MultiSpider: Towards Benchmarking Multilingual Text-to-SQL Semantic Parsing

    Language:Python7202
  • ramisa2108/Bangla-Complex-Named-Entity-Recognition-Challenge

    Winning Solution for the Bangla Complex Named Entity Recognition Challenge - BDOSN NLP Hackathon 2023

    Language:Jupyter Notebook7200
  • MaLA-LM/mala-500

    MaLA-500: Massive Language Adaptation of Large Language Models

    Language:Python5200
  • aditi184/MultilingualQA

    Chaii (Challenge in AI for India) Multilingual QnA - Google Research India

    Language:Jupyter Notebook4100
  • negar-foroutan/multilingual-code-switched-reasoning

    [EMNLP 2023 - Findings] Breaking the Language Barrier: Improving Cross-Lingual Reasoning with Structured Self-Attention

    Language:Python4302
  • e-hossam96/CMU-CS11-737

    Solutions of the CMU Multilingual Natural Language Processing Course

    Language:Shell3301
  • thesofakillers/CLAfICLe

    Official repository for the paper "CLAfICLe: Cross-Lingual Adaptation for In-Context Learning". Not Published.

    Language:TeX3200
  • ArkS0001/IIT-Bombay-Whisper-Hindi-ASR-Model-Machine-Learning-Intern

    Whisper is an automatic speech recognition (ASR) system trained on 680,000 hours of multilingual and multitask supervised data collected from the web. We show that the use of such a large and diverse dataset leads to improved robustness to accents, background noise and technical language. Moreover, it enables transcription in multiple languages, as

    Language:Jupyter Notebook2103
  • BatsResearch/LexC-Gen-Data-Archive

    Data Repository for LexC-Gen: Generating Data for Extremely Low-Resource Languages with Large Language Models and Bilingual Lexicons

  • cambridgeltl/sail-bli

    Self-Augmented In-Context Learning for Unsupervised Word Translation (ACL 2024). Keywords: Bilingual Lexicon Induction, Word Translation, Large Language Models, LLMs.

    Language:Python1601
  • deokhk/CBP

    Official Repository for Cross-lingual Back-Parsing: Utterance Synthesis from Meaning Representation for Zero-Resource Semantic Parsing (EMNLP 2024)

    Language:Python10
  • dkalpakchi/quinductor

    A multilingual data-driven method for generating reading comprehension questions

    Language:Jupyter Notebook1210
  • faisaltareque/Multilingual-Rouge-Scorer

    This Python package is used for calculating ROUGE scores and supports over 100 languages by utilizing a multilingual BPE tokenizer. It leverages the mBERT tokenizer and was developed to support our work XL-HeadTags.

    Language:Python10
  • harmonydata/harmony_original

    The Harmony project

    Language:Jupyter Notebook11261
  • harmonydata/harmony_r

    R library for Harmony

    Language:HTML1162
  • Helsinki-NLP/lm-vs-mt

    A Comparison of Language Modeling and Translation as Multilingual Pretraining Objectives

    Language:Python120
  • Judy-Choi/NMT_Series

    A collection of codes in a NMT series of Geultto 8th

    Language:Jupyter Notebook1200
  • AnanthaRajuC/AIML_NLP

    AIML Natural Language Processing - Speech, Audio

    Language:Java0300
  • CristianBudala/Multilingual-Sentiment-Analysis-and-Intent-Classification

    Multilingual sentiment analysis and intent classification in Romanian, Bachelors thesis

    Language:Jupyter Notebook0100
  • koushik16/Naive-Bayes-on-Multi-Language-Text

    Implementation of Naive Bayes for text classification across multiple languages, focusing on natural language processing and multilingual text analysis.

    Language:Python0200
  • Wei-RongRong2/RojakLanguageSentimentAnalysis

    This is a machine learning project focused on analysing and classifying sentiments in code-switched and code-mixed text, specifically targeting the unique linguistic characteristics found in Malaysian conversations.

    Language:Jupyter Notebook0100
  • Lucas-Granucci/MULTI-NER

    Repository for research project exploring the benefits of cross-lingual transfer learning and pseudo-labeling for multilingual named entity recognition.

    Language:Python