Language Machines
NLP Research group at Centre for Language Studies, Radboud University Nijmegen
Nijmegen, The Netherlands
Pinned Repositories
CLIN28_ST_spelling_correction
Scripts that were used for preparing and converting the Wikipedia documents that are part of the CLIN28 shared task on spelling correction
frog
Frog is an integration of memory-based natural language processing (NLP) modules developed for Dutch. All NLP modules are based on Timbl, the Tilburg memory-based learning software package.
LamaEvents
Lama Events is a calendar application listing events in the near future. The events are detected and selected by a fully automatic procedure in the Dutch Twitter stream.
libfolia
FoLiA library for C++
LuigiNLP
A workflow system for Natural Language Processing.
PICCL
A set of workflows for corpus building through OCR, post-correction and normalisation
ticcltools
Tools for TICCL
timbl
TiMBL implements several memory-based learning algorithms.
ucto
Unicode tokeniser. Ucto tokenizes text files: it separates words from punctuation, and splits sentences. It offers several other basic preprocessing steps such as changing case that you can all use to make your text suited for further processing such as indexing, part-of-speech tagging, or machine translation. Ucto comes with tokenisation rules for several languages and can be easily extended to suit other languages. It has been incorporated for tokenizing Dutch text in Frog, our Dutch morpho-syntactic processor. http://ilk.uvt.nl/ucto --
uctodata
Datafiles for the tokenizer ucto.
Language Machines's Repositories
LanguageMachines/LuigiNLP
A workflow system for Natural Language Processing.
LanguageMachines/CLIN28_ST_spelling_correction
Scripts that were used for preparing and converting the Wikipedia documents that are part of the CLIN28 shared task on spelling correction
LanguageMachines/LamaEvents
Lama Events is a calendar application listing events in the near future. The events are detected and selected by a fully automatic procedure in the Dutch Twitter stream.
LanguageMachines/quoll
LanguageMachines/ICDAR2017-PostOCR-Ticcl
Wrapper scripts for processing ICDAR2017 PostOCR data given a TICCL ranked input list
LanguageMachines/bp-som
BP-SOM: A hybrid of back-propagation learning in multi-layered perceptrons and self-organizing maps
LanguageMachines/homebrew-lamachine
Brew formulas for installing NLP software developed by the Language Machines research group
LanguageMachines/paramsearch
Automated parameter optimisation for Timbl
LanguageMachines/svn-timblmanual
copy from the old ILK svn
LanguageMachines/clin28
LanguageMachines/clst-webservices-meta
CLST webservices software metadata, only for those webservices/webapplications that are not included in LaMachine
LanguageMachines/CRoaring
Roaring bitmaps in C (and C++)
LanguageMachines/fambl
Family Memory Based Learning (original in ILK SVN)
LanguageMachines/GloVe
GloVe model for distributed word representation
LanguageMachines/knngraph
KNN graph software originally in TiCC SVN
LanguageMachines/SB-tokenizer
LanguageMachines/SoNaR
LanguageMachines/svn-mbmt
LanguageMachines/svn-sonar
Old Sonar stuff from the TiCC svn
LanguageMachines/svn-ticclopstools
Ols ticclopstools from the TiCC svn
LanguageMachines/tadpole
The good old predecessor of Frog
LanguageMachines/wikinerdata
Script to collect data from Wikipedia and automatically annotate the linked named entities with Named Entity type.
LanguageMachines/word2vec
This tool provides an efficient implementation of the continuous bag-of-words and skip-gram architectures for computing vector representations of words. These representations can be subsequently used in many natural language processing applications and for further research.