Language Machines

NLP Research group at Centre for Language Studies, Radboud University Nijmegen

Nijmegen, The Netherlands

Pinned Repositories

CLIN28_ST_spelling_correction
Scripts that were used for preparing and converting the Wikipedia documents that are part of the CLIN28 shared task on spelling correction
Language:Python10 10 144
frog
Frog is an integration of memory-based natural language processing (NLP) modules developed for Dutch. All NLP modules are based on Timbl, the Tilburg memory-based learning software package.
Language:C++78 15 10311
LamaEvents
Lama Events is a calendar application listing events in the near future. The events are detected and selected by a fully automatic procedure in the Dutch Twitter stream.
Language:HTML10 4 273
libfolia
FoLiA library for C++
Language:C++16 10 566
LuigiNLP
A workflow system for Natural Language Processing.
Language:Python21 6 25
PICCL
A set of workflows for corpus building through OCR, post-correction and normalisation
Language:Python49 7 647
ticcltools
Tools for TICCL
Language:C++14 7 474
timbl
TiMBL implements several memory-based learning algorithms.
Language:C++53 8 139
ucto
Unicode tokeniser. Ucto tokenizes text files: it separates words from punctuation, and splits sentences. It offers several other basic preprocessing steps such as changing case that you can all use to make your text suited for further processing such as indexing, part-of-speech tagging, or machine translation. Ucto comes with tokenisation rules for several languages and can be easily extended to suit other languages. It has been incorporated for tokenizing Dutch text in Frog, our Dutch morpho-syntactic processor. http://ilk.uvt.nl/ucto --
Language:C++69 12 9314
uctodata
Datafiles for the tokenizer ucto.
Language:Shell9 6 95

Language Machines's Repositories

LanguageMachines/frog
Frog is an integration of memory-based natural language processing (NLP) modules developed for Dutch. All NLP modules are based on Timbl, the Tilburg memory-based learning software package.
Language:C++78 15 10311
LanguageMachines/ucto
Unicode tokeniser. Ucto tokenizes text files: it separates words from punctuation, and splits sentences. It offers several other basic preprocessing steps such as changing case that you can all use to make your text suited for further processing such as indexing, part-of-speech tagging, or machine translation. Ucto comes with tokenisation rules for several languages and can be easily extended to suit other languages. It has been incorporated for tokenizing Dutch text in Frog, our Dutch morpho-syntactic processor. http://ilk.uvt.nl/ucto --
Language:C++69 12 9314
LanguageMachines/timbl
TiMBL implements several memory-based learning algorithms.
Language:C++53 8 139
LanguageMachines/PICCL
A set of workflows for corpus building through OCR, post-correction and normalisation
Language:Python49 7 647
LanguageMachines/libfolia
FoLiA library for C++
Language:C++16 10 566
LanguageMachines/ticcltools
Tools for TICCL
Language:C++14 7 474
LanguageMachines/mbt
MBT: Memory-based tagger generation and tagging MBT is a memory-based tagger-generator and tagger in one.
Language:C++9 9 71
LanguageMachines/uctodata
Datafiles for the tokenizer ucto.
Language:Shell9 6 95
LanguageMachines/ticcutils
Ticcutils, a generic utility library shared by our software.
Language:C++7 9 279
LanguageMachines/wopr
Memory Based Word Predictor/Language Model http://ilk.uvt.nl/wopr/
Language:C++5 6 3
LanguageMachines/foliautils
Command-line utilities for working with the Format for Linguistic Annotation (FoLiA), powered by libfolia (C++), written by Ko van der Sloot (CLST, Radboud University)
Language:C++4 8 723
LanguageMachines/timblserver
TiMBL implements several memory-based learning algorithms. This is the server part.
Language:C++3 8 3
LanguageMachines/dimbl
Distributed Tilburg Memory Based Learner
Language:C++2 19 02
LanguageMachines/dialect2keywords
Webinterface designed to convert words in Dutch dialects ("dialectopgaven") into standard Dutch keywords ("vernederlandste trefwoorden").
Language:Python1 6 0
LanguageMachines/frogdata
Data for Frog, mandatory
Language:Lex1 8 65
LanguageMachines/mbtserver
Language:C++1 7 32
LanguageMachines/releasereport
Language:Python1 17 02
LanguageMachines/toad
Toad: Trainer Of All Data, the Frog training collection
Language:C++1 19 42
LanguageMachines/bioport
Scrape pages about persons ('biographies') from Wikipedia.
Language:Python0 6 11
LanguageMachines/CLIN28-website
Language:CSS0 8 10
LanguageMachines/news-pt
Language:Python0 5 00
LanguageMachines/actiontests
small program to test travis issues. Like OSX and Clang OpenMP support
Language:M42 0
LanguageMachines/clariah-plus-tasks
An overview of CLARIAH-PLUS tasks at CLST, Radboud University, Nijmegen
Language:Makefile4 0
LanguageMachines/foliatest
Test suite for libfolia
Language:C++7 12
LanguageMachines/frogtests
Unit tests for Frog
Language:Lex7 1
LanguageMachines/JASMIN-BLISS-Negation
Documentation of a corpus sample of Dutch human-computer dialogues annotated with negation cues.
5 0
LanguageMachines/lexiconenrichment
Language:Shell5 1
LanguageMachines/mbttests
Unit tests for Mbt
Language:Lex17 0
LanguageMachines/ticcactions
collection of githib actions for use in ticc software
Language:Shell
LanguageMachines/timbltests
Unit tests for Timbl
Language:Euphoria17 0