parallel-corpus
There are 74 repositories under parallel-corpus topic.
NiuTrans/Classical-Modern
非常全的文言文(古文)-现代文平行语料
kirralabs/indonesian-NLP-resources
data resource untuk NLP bahasa indonesia
csebuetnlp/banglanmt
This repository contains the code and data of the paper titled "Not Low-Resource Anymore: Aligner Ensembling, Batch Filtering, and New Datasets for Bengali-English Machine Translation" published in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020), November 16 - November 20, 2020.
bfsujason/bertalign
Multilingual sentence alignment using sentence embeddings
Helsinki-NLP/OpusFilter
OpusFilter - Parallel corpus processing toolkit
tsuruoka-lab/BSD
The Business Scene Dialogue corpus
FerreroJeremy/Cross-Language-Dataset
A multilingual, multi-style and multi-granularity dataset for cross-language textual similarity detection
ShathaTm/LK-Hadith-Corpus
Leeds University and King Saud University (LK) Hadith Corpus
matbahasa/TALPCo
TUFS Asian Language Parallel Corpus
sharad461/nepali-translator
Neural Machine Translation on the Nepali-English language pair
asmelashteka/HornMT
Machine translation (MT) benchmark dataset for languages in the Horn of Africa.
Caucasus-Rosetta/Lingua-Corpus
Caucasus languages focused multilingual and monolingual corpuses for Natural Language Processing(NLP)
Kartikaggarwal98/Indian_ParallelCorpus
Curated list of publicly available parallel corpus for Indian Languages
BramVanroy/astred
An easy-to-use library to linguistically compare one sentence and its words to another, in the same language or a different one. For instance useful for comparing a translation with the original text, to find differences and similarities between two different translations, or to see how a machine translation differs from a reference translation.
cfiltnlp/IITB-English-Hindi-PC
The IIT Bombay English-Hindi Parallel Corpus
priyanshu2103/Sanskrit-Hindi-Machine-Translation
Machine Translation from Sanskrit to Hindi using Unsupervised and Supervised Learning
korenyoni/opus-api
OPUS (opus.nlpl.eu) Python3 API
farshadjafari/parallel_corpus_generator
Python application, generating parallel corpus for any language pairs, can be used for training nmt (Neural Machine Translation) systems
Giuseppe-Della-Corte/IESTAC
A corpus that can be used to train English-to-Italian End-to-End Speech-to-Text Machine Translation models
YerevaNN/PARASITE
🪱 PARASITE || A parallel sentence data preprocessing toolkit. Originally developed as a part of the `en-ru` winner submission of WMT20 Biomedical Translation Task.
soumendrak/MTEnglish2Odia
Machine Translation from English to Odia language.
KurdishBLARK/InterdialectCorpus
A parallel corpus of Sorani, Kurmanji and English
michmech/irish-sentence-bank
4,500 sentences in Irish, tokenized, manually lemmatized, translated into English.
spraakbanken/swell-editor
Editor for normalising learner texts (error annotation and tagging.)
tsuruoka-lab/AMI-Meeting-Parallel-Corpus
AMI Meeting Parallel Corpus
x39826/Pali_Tripitaka
Pali Buddhist scriptures of 15 countries and its parallel corpus
shashanksiripragada/pib-crawl
Code to extract multilingual parallel corpus from Press Information Bureau (PIB) website.
stibiumghost/tajik-to-persian-transliteration
Tajik-to-Persian transliteration project
tanloong/interlaced.nvim
Neovim plugin for aligning bilingual parallel texts
UUDigitalHumanitieslab/timealign
Parallel corpus annotation and visualization
x39826/Multilang_Translator_For_Pali_Tripitaka
Parallel corpus and multilingual machine translation system of the Pali Buddhist scriptures in 15 countries(15国巴利文大藏经平行语料与多语言机器翻译系统)
UUDigitalHumanitieslab/perfectextractor
Extracting present perfects (and related forms) from parallel corpora
PyThaiNLP/Thai-Lao-Parallel-Corpus
Thai Lao Parallel corpus
rggdmonk/hadal
A simple and efficient tool for mining and aligning sentences with pre-trained models.
TienZhao/suoyan.pro
Online parallel text alignment tool.
tldr-pages/tldr-translation-pairs-gen
Generates a structured dataset in various formats derived from tldr-pages.