/scode

Sphere embedding (s-code) is a variation of Euclidean embedding of co-occurence data (code).

MIT LicenseMIT

SCODE    		            Copyright (c) 2012, Deniz Yuret

Usage: scode [OPTIONS] < file
  file should have two columns of arbitrary tokens
  -r RESTART: number of restarts (default 1)
  -i NITER: number of iterations over data (default 20)
  -d NDIM: number of dimensions (default 25)
  -z Z: partition function approximation (default 0.166)
  -p PHI0: learning rate parameter (default 50.0)
  -u NU0: learning rate parameter (default 0.2)
  -s SEED: random seed (default 0)
  -c: calculate real Z (default false)
  -m: merge vectors at output (default false)
  -v: verbose messages (default false)

Algorithm: Let X and Y be two categorical variables with finite
cardinalities |X| and |Y|. We observe a set of pairs {xi, yi} drawn
IID from the joint distribution of X and Y. The basic idea behind CODE
and related methods is to represent (embed) each value of X and each
value of Y as points in a common low dimensional Euclidean space such
that values that frequently co-occur lie close to each other. S-CODE
further restricts these points to lie on the unit sphere. Please see
the following papers for a description.

* Amir Globerson, Gal Chechik, Fernando Pereira and Naftali
Tishby. Euclidean Embedding of Co-Occurrence Data. Journal of Machine
Learning Research 8 (Oct), p.2265-2295, 2007.

* Yariv Maron, Michael Lamar, Elie Bienenstock. Sphere Embedding: An
application to unsupervised POS tagging. (NIPS 2010)

* Mehmet Ali Yatbaz, Enis Sert, and Deniz Yuret. 2012. Learning
Syntactic Categories Using Paradigmatic Representations of Word
Context. EMNLP 2012.

Please see the file LICENSE for terms of use.  To compile you need the
glib2 and gsl libraries.  Otherwise everything is standard C, so just
typing make should give you an executable.