code-mixing
There are 31 repositories under code-mixing topic.
gentaiscool/code-switching-papers
A curated list of research papers and resources on code-switching
microsoft/CodeMixed-Text-Generator
This tool helps automatic generation of grammatically valid synthetic Code-mixed data by utilizing linguistic theories such as Equivalence Constant Theory and Matrix Language Theory.
microsoft/LID-tool
This code provides word level language identification tool for identifying language for individual words in Code-Mixed text. e.g. The text that includes words from two languages such as Hindi written in roman script, mixed with English.
praatibhsurana/Hinglish_Hindi_WSD
A pipeline for transliteration, spell correction, POS tagging and word sense disambiguation of Hinglish code mixed data to Hindi Devanagari script.
aparnadutta/code-mixed-lid
Word-level language identification for Bangla-English code-mixed social media data, using a BiLSTM with subword embeddings.
cisnlp/MaskLID
💬 MaskLID: Code-Switching Language Identification through Iterative Masking -- ACL 2024
salesforce/adversarial-polyglots
Code for the paper "Code-Mixing on Sesame Street: Dawn of the Adversarial Polyglots" (NAACL-HLT 2021)
ash-shar/Code-Switching-and-Swearing-Patterns-on-Twitter
Repository containing Abusive Tweet Detection, Location Detection and Gender Detection codes
LCS2-IIITD/HIT-ACL2021-Codemixed-Representation
This repo contains the source code of HIT: A Hierarchically Fused Deep Attention Network for RobustCode-mixed Language Representation (Accepted in ACL 2021)
mmaguero/josa-corpus
Jopara (Guarani-dominant mixed with Spanish) sentiment analysis corpus
andrianllmm/tagLID
A word-level Language Identification (LID) tool for Tagalog-English (Taglish) text
gulabpatel/Code-Mixing
will discuss code mixing algorithms evolution
ir-nlp-csui/id-en-code-mixed
Indonesian-English code-mixed Twitter dataset
ayanc18/PsycholinguisticCodeMixing
Psycholinguistic Analysis of Code Mixing - Speech and Natural Language Processing Term Project: CS60057. Department of Computer science and Engineering, Indian Institute of Technology Kharagpur
Lidan0241/language-detection
A language detection model for code-switched texts in es/en/zh
Wei-RongRong2/RojakLanguageSentimentAnalysis
This is a machine learning project focused on analysing and classifying sentiments in code-switched and code-mixed text, specifically targeting the unique linguistic characteristics found in Malaysian conversations.
Bernardbyy/BahasaRojakSentimentAnalysis
Handling Bahasa Rojak (Malaysian Code Mixing Language) OOV and performing Sentiment Analysis using downstreamed XLM-R
carexl8/code-mixed-tweets
Tweet ids for code-mixed Russian-German and Russian-Hebrew tweets
jessicasaikia/bidirectional-long-short-term-memory-BiLSTM
This repository implements a Bidirectional Long Short Term Memory (BiLSTM) for performing Parts-of-Speech (POS) Tagging on Assamese-English code-mixed texts.
jessicasaikia/conditional-random-field-CRF
This repository implements a Conditional Random Field (CRF) for performing Parts-of-Speech (POS) Tagging on Assamese-English code-mixed texts.
jessicasaikia/hidden-markov-model-HMM
This repository implements a Hidden Markov Model (HMM) for performing Parts of Speech (POS) Tagging on Assamese-English code-mixed texts.
jessicasaikia/long-short-term-memory-LSTM
This repository implements a Long Short Term Memory (LSTM) for performing Parts-of-Speech (POS) Tagging on Assamese-English code-mixed texts.
jessicasaikia/multilingual-BERT-mBERT
This repository implements a Multilingual BERT (mBERT) model for performing Parts-of-Speech (POS) Tagging on Assamese-English code-mixed texts.
jessicasaikia/rule-based
This repository contains a simple Rule-Based Model for Parts-of-Speech tagging in Assamese-English code mixed texts.
MuhammedFahd/Depression-Detection-in-Singlish-text
This is a depression detection system that detects depression in Sinhala-English code-mixed text content which are published by different users on social media. The frontend of the system was developed using Bootstrap, HTML, and Jquery and the backend of the system was developed using Flask
vcyrot/Frenglish-Benchmark
A Centralized Frenglish Benchmark from Naturally Occurring Code-Switching and Code-Mixing
Anwarvic/truel_bilingual_nmt
The official code for the "True Bilingual NMT" paper
Nexdata-AI/300-Person-Mandarin-Chinese-and-English-Bilingual-Spontaneous-Monologue-smartphone
300-Person-Mandarin-Chinese-and-English-Bilingual-Spontaneous-Monologue-smartphone