Pinned Repositories
geniatagger
- part-of-speech tagging, shallow parsing, and named entity recognition for biomedical text -
imgevolve
Evolve images from sets of triangles.
kaggle-stackoverflow2012
My entry to the Kaggle 2012 Stack Overflow competition. Ranked 10th on the final public leaderboard.
kaggle-stumbleupon2013
My entry to the Kaggle 2013 StumbleUpon competition. Ranked 4th on the final private leaderboard.
langid.c
Pure C natural language identifier with support for 97 languages
langid.js
An off-the-shelf client-side language identification module for JavaScript.
langid.py
Stand-alone language identification system
polyglot
Polyglot is a language identifier for detecting text documents containing text written in more than one language, and for identifying the languages therein.
updatedir
Rsync-like directory updating over multiple protocols
wikidump
Tools to manipulate and extract data from wikipedia dumps
saffsd's Repositories
saffsd/langid.py
Stand-alone language identification system
saffsd/kaggle-stackoverflow2012
My entry to the Kaggle 2012 Stack Overflow competition. Ranked 10th on the final public leaderboard.
saffsd/wikidump
Tools to manipulate and extract data from wikipedia dumps
saffsd/polyglot
Polyglot is a language identifier for detecting text documents containing text written in more than one language, and for identifying the languages therein.
saffsd/langid.c
Pure C natural language identifier with support for 97 languages
saffsd/geniatagger
- part-of-speech tagging, shallow parsing, and named entity recognition for biomedical text -
saffsd/kaggle-stumbleupon2013
My entry to the Kaggle 2013 StumbleUpon competition. Ranked 4th on the final private leaderboard.
saffsd/langid.js
An off-the-shelf client-side language identification module for JavaScript.
saffsd/imgevolve
Evolve images from sets of triangles.
saffsd/updatedir
Rsync-like directory updating over multiple protocols
saffsd/daifugo
Simulation system for the japanese card game Daifugo.
saffsd/ldig
Language Detection with Infinity-gram
saffsd/linguini.py
linguini.py is a pure-Python implementation of linguini, a vector-space model language identifier with support for bilingual and trilingual documents.
saffsd/assignmentprint
Pretty printer for student-submitted assignments. Helps with prettyprinting student code and generating reports.
saffsd/forum_features
Data model for manipulating forum data.
saffsd/language_data
Pythonic interface to natural language metadata
saffsd/alta2012-langidforlm
Code to build corpora from ClueWeb09
saffsd/alta2012-sharedtask
Full reference implementation of the entry that won the ALTA2012 Shared Task.
saffsd/alta2012-usim
Supporting materials for ALTA2012 publication "Unsupervised Estimation of Word Usage Similarity"
saffsd/LibSVMsharp
C# wrapper of LibSVM
saffsd/piboso
Sentence tagger for biomedical abstracts.
saffsd/python-readability
fast python port of arc90's readability tool, updated to match latest readability.js!