Language Machines
NLP Research group at Centre for Language Studies, Radboud University Nijmegen
Nijmegen, The Netherlands
Pinned Repositories
CLIN28_ST_spelling_correction
Scripts that were used for preparing and converting the Wikipedia documents that are part of the CLIN28 shared task on spelling correction
frog
Frog is an integration of memory-based natural language processing (NLP) modules developed for Dutch. All NLP modules are based on Timbl, the Tilburg memory-based learning software package.
LamaEvents
Lama Events is a calendar application listing events in the near future. The events are detected and selected by a fully automatic procedure in the Dutch Twitter stream.
libfolia
FoLiA library for C++
LuigiNLP
A workflow system for Natural Language Processing.
PICCL
A set of workflows for corpus building through OCR, post-correction and normalisation
ticcltools
Tools for TICCL
timbl
TiMBL implements several memory-based learning algorithms.
ucto
Unicode tokeniser. Ucto tokenizes text files: it separates words from punctuation, and splits sentences. It offers several other basic preprocessing steps such as changing case that you can all use to make your text suited for further processing such as indexing, part-of-speech tagging, or machine translation. Ucto comes with tokenisation rules for several languages and can be easily extended to suit other languages. It has been incorporated for tokenizing Dutch text in Frog, our Dutch morpho-syntactic processor. http://ilk.uvt.nl/ucto --
uctodata
Datafiles for the tokenizer ucto.
Language Machines's Repositories
LanguageMachines/frog
Frog is an integration of memory-based natural language processing (NLP) modules developed for Dutch. All NLP modules are based on Timbl, the Tilburg memory-based learning software package.
LanguageMachines/ucto
Unicode tokeniser. Ucto tokenizes text files: it separates words from punctuation, and splits sentences. It offers several other basic preprocessing steps such as changing case that you can all use to make your text suited for further processing such as indexing, part-of-speech tagging, or machine translation. Ucto comes with tokenisation rules for several languages and can be easily extended to suit other languages. It has been incorporated for tokenizing Dutch text in Frog, our Dutch morpho-syntactic processor. http://ilk.uvt.nl/ucto --
LanguageMachines/PICCL
A set of workflows for corpus building through OCR, post-correction and normalisation
LanguageMachines/timbl
TiMBL implements several memory-based learning algorithms.
LanguageMachines/libfolia
FoLiA library for C++
LanguageMachines/ticcltools
Tools for TICCL
LanguageMachines/mbt
MBT: Memory-based tagger generation and tagging MBT is a memory-based tagger-generator and tagger in one.
LanguageMachines/uctodata
Datafiles for the tokenizer ucto.
LanguageMachines/ticcutils
Ticcutils, a generic utility library shared by our software.
LanguageMachines/wopr
Memory Based Word Predictor/Language Model http://ilk.uvt.nl/wopr/
LanguageMachines/foliautils
Command-line utilities for working with the Format for Linguistic Annotation (FoLiA), powered by libfolia (C++), written by Ko van der Sloot (CLST, Radboud University)
LanguageMachines/timblserver
TiMBL implements several memory-based learning algorithms. This is the server part.
LanguageMachines/dimbl
Distributed Tilburg Memory Based Learner
LanguageMachines/dialect2keywords
Webinterface designed to convert words in Dutch dialects ("dialectopgaven") into standard Dutch keywords ("vernederlandste trefwoorden").
LanguageMachines/frogdata
Data for Frog, mandatory
LanguageMachines/mbtserver
LanguageMachines/releasereport
LanguageMachines/toad
Toad: Trainer Of All Data, the Frog training collection
LanguageMachines/CLIN28-website
LanguageMachines/bioport
Scrape pages about persons ('biographies') from Wikipedia.
LanguageMachines/clariah-plus-tasks
An overview of CLARIAH-PLUS tasks at CLST, Radboud University, Nijmegen
LanguageMachines/foliatest
Test suite for libfolia
LanguageMachines/frogtests
Unit tests for Frog
LanguageMachines/JASMIN-BLISS-Negation
Documentation of a corpus sample of Dutch human-computer dialogues annotated with negation cues.
LanguageMachines/json
JSON for Modern C++
LanguageMachines/lexiconenrichment
LanguageMachines/mbttests
Unit tests for Mbt
LanguageMachines/news-pt
LanguageMachines/timbltests
Unit tests for Timbl
LanguageMachines/travistest
small program to test travis issues. Like OSX and Clang OpenMP support