/embedding-benchmark

Word Embedding benchmark project By Shahid Beheshti University NLP Lab

GNU General Public License v3.0GPL-3.0

embedding-benchmark

Word Embedding benchmark project By Shahid Beheshti University NLP Lab

Please read Our Wiki Page for more information

Folder structure :

  • data/corpus This must be empty as the codes will downlaod the corpus from some external repository to here.
  • data/analogy Contains the analogy dataset(s)
  • data/wordsim Contains the word similarity dataset(s)
  • data/categories Contains the catgories dataset(s)
  • code This folder contains codes that will be used to run all evaluation related tasks and utulities to downlaod the corpus files
  • scripts This folder contains cleansing/crawling and any other once off activity that needs to be done.