proycon
Research software engineer - NLP - AI - 🐧 Linux & open-source enthusiast - 🐍 Python/ 🌊C/C++ / 🦀 Rust / 🐚 Shell - 🔐 InfoSec - https://git.sr.ht/~proycon
KNAW Humanities Cluster & CLST, Radboud UniversityEindhoven, the Netherlands
Pinned Repositories
stam
Stand-off Text Annotation Model (STAM) is a data model for stand-off-text annotation where any information on a text is represented as an annotation. This repository contains the model's full specification, extensions, schemas, examples and documentation.
analiticcl
an approximate string matching or fuzzy-matching system for spelling correction, normalisation or post-OCR correction (mirror of https://codeberg.org/proycon/analiticcl)
clam
Quickly turn command-line applications into RESTful webservices with a web-application front-end. You provide a specification of your command line application, its input, output and parameters, and CLAM wraps around your application to form a fully fledged RESTful webservice.
codemetapy
A Python package for generating and working with codemeta
colibri-core
Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patterns with one or more gaps, either of fixed or dynamic size) in a quick and memory-efficient way. At the core is the tool ``colibri-patternmodeller`` whi ch allows you to build, view, manipulate and query pattern models.
dotfiles
My dotfiles (mirror of https://git.sr.ht/~proycon/dotfiles)
flat
FoLiA Linguistic Annotation Tool -- Flat is a web-based linguistic annotation environment based around the FoLiA format (http://proycon.github.io/folia), a rich XML-based format for linguistic annotation. Flat allows users to view annotated FoLiA documents and enrich these documents with new annotations, a wide variety of linguistic annotation types is supported through the FoLiA paradigm.
folia
FoLiA: Format for Linguistic Annotation - FoLiA is a rich XML-based annotation format for the representation of language resources (including corpora) with linguistic annotations. A wide variety of linguistic annotations are supported, making FoLiA a useful format for NLP tasks and data interchange. Note that the actual Python library for processing FoLiA is implemented as part of PyNLPl, this contains higher-level tools that use the library as well as the full documentation, validation schemas, and set definitions
pynlpl
PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotation).
vocage
A minimalistic spaced-repetion vocabulary trainer (flashcards) for the terminal
proycon's Repositories
proycon/clam
Quickly turn command-line applications into RESTful webservices with a web-application front-end. You provide a specification of your command line application, its input, output and parameters, and CLAM wraps around your application to form a fully fledged RESTful webservice.
proycon/colibri-core
Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patterns with one or more gaps, either of fixed or dynamic size) in a quick and memory-efficient way. At the core is the tool ``colibri-patternmodeller`` whi ch allows you to build, view, manipulate and query pattern models.
proycon/flat
FoLiA Linguistic Annotation Tool -- Flat is a web-based linguistic annotation environment based around the FoLiA format (http://proycon.github.io/folia), a rich XML-based format for linguistic annotation. Flat allows users to view annotated FoLiA documents and enrich these documents with new annotations, a wide variety of linguistic annotation types is supported through the FoLiA paradigm.
proycon/python-frog
Python bindings to the dutch NLP tool Frog (pos tagger, lemmatiser, NER tagger, morphological analysis, shallow parser, dependency parser)
proycon/analiticcl
an approximate string matching or fuzzy-matching system for spelling correction, normalisation or post-OCR correction (mirror of https://codeberg.org/proycon/analiticcl)
proycon/python-ucto
This is a Python binding to the tokenizer Ucto. Tokenisation is one of the first step in almost any Natural Language Processing task, yet it is not always as trivial a task as it appears to be. This binding makes the power of the ucto tokeniser available to Python. Ucto itself is regular-expression based, extensible, and advanced tokeniser written in C++ (http://ilk.uvt.nl/ucto).
proycon/dotfiles
My dotfiles (mirror of https://git.sr.ht/~proycon/dotfiles)
proycon/codemetapy
A Python package for generating and working with codemeta
proycon/homeassistant-config
My elaborate home automation configuration + scripts
proycon/python-timbl
python-timbl, originally developed by Sander Canisius, is a Python extension module wrapping the full TiMBL C++ programming interface. With this module, all functionality exposed through the C++ interface is also available to Python scripts. Being able to access the API from Python greatly facilitates prototyping TiMBL-based applications.
proycon/codemeta-harvester
Harvest and aggregate codemeta/schema.org software metadata from source repositories and service endpoints, automatically converting from known metadata schemes in the process
proycon/valkuil-gecco
Nederlandse Spellingscontrole / Dutch spelling correction system - powered by Gecco
proycon/foliadocserve
FoLiA Document Server - HTTP webservice backend for serving and annotating FoLiA documents using the FoLiA Query Language (FQL). Used by FLAT.
proycon/codemeta-server
Server for codemeta, in memory triple store, SPARQL endpoint and simple web-based visualisation for end-user
proycon/parseme-support
FoLiA & FLAT support for PARSEME
proycon/homepage
My website (mirror of https://git.sr.ht/~proycon/homepage)
proycon/hyphertool
Command-line tool for syllabification and hyphenisation for multiple languages
proycon/sumservice
Summarisation service
proycon/codemeta2html
Convert software metadata descriptions in codemeta to html
proycon/codemeta2mp
codemeta to SSHOC Open Marketplace converter
proycon/globalise-tools
tools for globalise tasks
proycon/iamb
A Matrix client for Vim addicts
proycon/LibreTranslate
Free and Open Source Machine Translation API. Self-hosted, offline capable and easy to setup.
proycon/lighthome
Lightweight home automation scripts and programs, over MQTT (mirror of https://git.sr.ht/~proycon/lighthome)
proycon/nfc-daemon
nfc-daemon is a very simple event daemon that reads out the UID of nfc-tags and executes scripts.
proycon/sshoc-marketplace-backend
Code for the backend
proycon/switchboard-tool-registry
The Switchboard Tool Registry
proycon/textframe
TextFrame is a low-level Rust library to access plain text files, including plain-text corpora of considerable size, without loading them into memory entirely.
proycon/w3id.org
Website source code for w3id.org.
proycon/xpilot
Open-source, cross-platform X-Plane pilot client for VATSIM.