SCODE Copyright (c) 2012, Deniz Yuret Usage: scode [OPTIONS] < file file should have two columns of arbitrary tokens -r RESTART: number of restarts (default 1) -i NITER: number of iterations over data (default 20) -d NDIM: number of dimensions (default 25) -z Z: partition function approximation (default 0.166) -p PHI0: learning rate parameter (default 50.0) -u NU0: learning rate parameter (default 0.2) -s SEED: random seed (default 0) -c: calculate real Z (default false) -m: merge vectors at output (default false) -v: verbose messages (default false) Algorithm: Let X and Y be two categorical variables with finite cardinalities |X| and |Y|. We observe a set of pairs {xi, yi} drawn IID from the joint distribution of X and Y. The basic idea behind CODE and related methods is to represent (embed) each value of X and each value of Y as points in a common low dimensional Euclidean space such that values that frequently co-occur lie close to each other. S-CODE further restricts these points to lie on the unit sphere. Please see the following papers for a description. * Amir Globerson, Gal Chechik, Fernando Pereira and Naftali Tishby. Euclidean Embedding of Co-Occurrence Data. Journal of Machine Learning Research 8 (Oct), p.2265-2295, 2007. * Yariv Maron, Michael Lamar, Elie Bienenstock. Sphere Embedding: An application to unsupervised POS tagging. (NIPS 2010) * Mehmet Ali Yatbaz, Enis Sert, and Deniz Yuret. 2012. Learning Syntactic Categories Using Paradigmatic Representations of Word Context. EMNLP 2012. Please see the file LICENSE for terms of use. To compile you need the glib2 and gsl libraries. Otherwise everything is standard C, so just typing make should give you an executable.