Pinned Repositories
ache-multilingual
ACHE is a web crawler for domain-specific search. This fork aims to adapt it to crawl pages in specific languages for building paralel corpora.
apertium-packaging
Debian, Fedora, Windows, macOS packaging scripts for Apertium, HFST, CG-3, and related techs.
bicleaner
doommoses
Python port of Moses tokenizer, truecaser and normalizer
fasterText
Library for fast text representation and classification.
hunalign
Sentence aligner
marian-dev
Fast Neural Machine Translation in C++ - development repository
mosesdecoder
Moses, the machine translation system
tmxt
Transform TMX to text
sortiz's Repositories
sortiz/tmxt
Transform TMX to text
sortiz/bicleaner
sortiz/doommoses
Python port of Moses tokenizer, truecaser and normalizer
sortiz/ache-multilingual
ACHE is a web crawler for domain-specific search. This fork aims to adapt it to crawl pages in specific languages for building paralel corpora.
sortiz/apertium-packaging
Debian, Fedora, Windows, macOS packaging scripts for Apertium, HFST, CG-3, and related techs.
sortiz/fasterText
Library for fast text representation and classification.
sortiz/hunalign
Sentence aligner
sortiz/marian-dev
Fast Neural Machine Translation in C++ - development repository
sortiz/mosesdecoder
Moses, the machine translation system
sortiz/stop-words
List of common stop words in various languages.
sortiz/ulysses-sentence-splitter
sortiz/urlrewritefilter
A Java Web Filter with functionality like Apache's mod_rewrite
sortiz/parallel-urls-classifier
Parallel URLs Classifier (PUC) infers the parallelness of a pair of documents from their URLs
sortiz/preprocess
Corpus preprocessing
sortiz/url2lang
url2lang infers the language of a document from its URL
sortiz/warc2text
Extracts plain text, language identification and more metadata from WARC records