corpus-processing
There are 87 repositories under corpus-processing topic.
BLKSerene/Wordless
An Integrated Corpus Tool With Multilingual Support for the Study of Language, Literature, and Translation
bitextor/bitextor
Bitextor generates translation memories from multilingual websites
hankcs/TreebankPreprocessing
Python scripts preprocessing Penn Treebank and Chinese Treebank
Helsinki-NLP/OpusFilter
OpusFilter - Parallel corpus processing toolkit
NathanDuran/Switchboard-Corpus
Utilities for Processing the Switchboard Dialogue Act Corpus
OHNLP/MedTator
A Serverless Text Annotation Tool for Corpus Development
johentsch/ms3
A parser for annotated MuseScore 3 files.
uma-pi1/OPIEC
Reading the data from OPIEC - an Open Information Extraction corpus
NathanDuran/MRDA-Corpus
Utilities for Processing the Meeting Recorder Dialogue Act Corpus
versotym/rhymetagger
A simple collocation-driven recognition of rhymes. Contains pre-trained models for Czech, Dutch, English, French, German, Russian, and Spanish poetry
notesjor/corpusexplorer2.0
Korpuslinguistik war noch nie so einfach...
jaytimm/corpuslingr
A library of functions enabling complex corpus search in context (KWIC), search aggregation, bag-of-words building & keyphrase extraction.
Bibliome/alvisnlp
ALvisNLP corpus processing engine
zgornel/StringAnalysis.jl
Hard-Forked from JuliaText/TextAnalysis.jl
jonathandunn/corpus_similarity
Measure the similarity of text corpora for 74 languages
felipetovarhenao/exquisitecorpus
A set of corpus-based sampling & analysis M4L devices
jonathandunn/common_crawl_corpus
Scripts for building a geo-located web corpus using Common Crawl data
kennedyCzar/NLP-PROJECT-BOOK-INSIGHTS-WITH-PLOTLY
Plotly-Dash NLP project. Document similarity measure using Latent Dirichlet Allocation, principal component analysis and finally follow with KMeans clustering. Project is completed with dynamic visual interaction.
Linguista/CQPweb-Instabox
Script that sets up and configures an entire CQPweb server installation
ku-nlp/kyoto-reader
A processor for KyotoCorpus, KWDLC, and AnnotatedFKCCorpus
NathanDuran/Maptask-Corpus
Utilities for Processing the HCRC Map Task Corpus
CSCfi/Kielipankki-utilities
Scripts for data conversion
StarlangSoftware/Corpus
Corpus processing library
thecsw/katya-dev
Katya or The Liberated Corpus a text corpus that allows you to request and scrape any web resource!
CLARIAH/wp6-missieven
General Missives in Text-Fabric
ringoreality/uniblock
uniblock, scoring and filtering corpus with Unicode block information (and more).
CaterinaBi/parameters-corpus-work
Paper that Giuseppe Samo and I are working on as part of my SNSF-funded 'Focus in diachrony' research project at the University of Cambridge, UK.
keymastervn/htksupport
Minimal HTK for supporting HTK in Vietnamese.
levindoneto/lanGen
N-Gram language model that learns n-gram probabilities from a given corpus and generates new sentences from it based on the conditional probabilities from the generated words and phrases.
StarlangSoftware/Corpus-CPP
Corpus processing library
LeviMatheus/tcc-readability-score-level
Repositório para disponibilização de bases de dados do Wikipedia e Simple Wikipedia pré-processadas, além de scripts de pré-processamento e geração de bases em Python.
Linguista/Frequency-List-Wizard
Frequency List Wizard is a command-line program that does various useful things with... frequency lists.
StarlangSoftware/Corpus-Py
Corpus processing library
apple-fritter/muffin.tin
Mozilla Firefox places.sqlite tables exported to XML files. A Bash script.
Navnedia/Building-A-Search-Engine
A basic search engine to index a corpus for searching and rank the document data set.