/kadingir

Implementations of spectral word embedding methods

Primary LanguageC++

kadingir

This is an open source implementation of

Oshikiri, T., Fukui, K., Shimodaira, H. (2016). Cross-Lingual Word Representations via Spectral Graph Embeddings. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics. (To appear)

Contents

  • src/ : Source codes (C++ & Rcpp version)
  • cpp/ : Source codes (C++ only version)
  • experiments/ : Code used in the experiments
  • tools/

Implemented methods

  • CL-LSI [Littman+ 1998]
  • Eigenwords [Dhillon+ 2012] [Dhillon+ 2015]
    • One-Step CCA (OSCCA)
    • Two-Step CCA (TSCCA)
  • Eigendocs
  • Cross-Lingual Eigenwords (CL-Eigenwords) [Oshikiri+ 2016]

Required datasets for experiments

Submodules

References

  • Oshikiri, T., Fukui, K., Shimodaira, H. (2016). Cross-Lingual Word Representations via Spectral Graph Embeddings. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics. (To appear)
  • Dhillon, P., Rodu, J., Foster, D., and Ungar, L. (2012). Two step cca: A new spectral method for estimating vector models of words. In Langford, J. and Pineau, J., editors, Proceedings of the 29th International Conference on Machine Learning (ICML-12), ICML ’12, pages 1551–1558, New York, NY, USA. Omnipress.
  • Dhillon, P. S., Foster, D. P., and Ungar, L. H. (2015). Eigenwords: Spectral word embeddings. Journal of Machine Learning Research, 16:3035–3078.
  • Littman, M. L., Dumais, S. T., & Landauer, T. K. (1998). Automatic cross-language information retrieval using latent semantic indexing. Cross-Language Information Retrieval, 51–62.

License

GPL v3