/cococpyp

This is a Colibri-core enhanced fork of https://github.com/redpony/cpyp

Primary LanguageC++

Language Machines Badge Build Status

This repository contains the code for my PhD project "What's left in the bag for latent variable language modelling", a joint-doctorate project between the Radboud University Nijmegen, the Netherlands, and the KU Leuven, Belgium. In this project I look at bag-of-words for language modelling, and try to find information in this bag-of-words that is currently unexploited such as skipgrams. Bayesian models are exemplar for latent variable models, and it is this intersection of language modelling and Bayesian statistics that I find interesting.

Our main model is a hierarchical Pitman-Yor language model based on skipgrams. The models generated by this toolkit are language agnostic.

It is based on a fork of cpyp (https://github.com/redpony/cpyp) which I enhanced with Colibri-core (https://github.com/proycon/colibri-core).

More info and results will be added later.