Pinned Repositories
release2_inspection
warc2text-runner
Scripts for parallelized extraction of plain texts from WARC archieves. Aiming at common and reproducible extraction approach.
alembic
⚗️ A Jekyll boilerplate theme designed to be a starting point for any Jekyll website
cs-lid-harder-than-you-think
Repository accompanying "Code-Switched Language Identification is Harder Than You Think" (EACL 2024))
exploring-diversity-bt
Repository accompanying "Exploring Diversity in Back Translation for Low-Resource Machine Translation" (Burchell et al., NAACL 2022)
LASER
Language-Agnostic SEntence Representations
multi-sentence-questions
Data and code for multi-sentence question paper
open-lid-dataset
Repository accompanying "An Open Dataset and Model for Language Identification" (Burchell et al., 2023)
flores
The FLORES+ Machine Translation Benchmark
giashard
Sharding program for Paracrawl
laurieburchell's Repositories
laurieburchell/open-lid-dataset
Repository accompanying "An Open Dataset and Model for Language Identification" (Burchell et al., 2023)
laurieburchell/cs-lid-harder-than-you-think
Repository accompanying "Code-Switched Language Identification is Harder Than You Think" (EACL 2024))
laurieburchell/multi-sentence-questions
Data and code for multi-sentence question paper
laurieburchell/exploring-diversity-bt
Repository accompanying "Exploring Diversity in Back Translation for Low-Resource Machine Translation" (Burchell et al., NAACL 2022)
laurieburchell/alembic
⚗️ A Jekyll boilerplate theme designed to be a starting point for any Jekyll website
laurieburchell/LASER
Language-Agnostic SEntence Representations