Interpretability Of Word Embeddings

Requirements

Tested with Python 3.7.4.

Every dependency can be found in the Requirements file.

Pip:

pip install -r requirements.txt

Usage

glove_cli.py <embedding_path> <semcat_dir> [-h] [-dense_file] [--lines_to_read LINES_TO_READ] [--mcrae_dir MCRAE_DIR] [-mcrae_words_only] [--weights_dir WEIGHTS_DIR] [-save_weights] [-load_weights] [--calculate CALCULATE] [--calculation_args [CALCULATION_ARGS [CALCULATION_ARGS ...]]]

Required parameters
  • embedding_path - Path to the Glove embedding file (f.e. "glove/glove.6B.300d.txt")
  • semcat_dir - Path to the SemCat categories directory (f.e. "semcat/Categories")
Embedding related parameters
  • dense_file - If embedding_path points to a dense embedding file, mark it with this parameter
  • lines_to_read - Maximum vectors to read. Default -1 (All vector)
  • mcrae_dir - McRae directory
  • mcrae_words_only - Use McRae words only
Model related parameters
  • weights_dir - A path where the weights going to be saved to or read from (f.e. "weights/"), Default "out/"
  • save_weights - Save weights to weights_dir
  • load_weights - Load weights from weights_dir
validation related parameters
  • calculate - Calculation method [score|decomp]
  • calculation_args - List of arguments for calculation:
    • score:
      • [int ] <- Lamda value which > 0. Optional: Default 1.
    • decomp:
      • [str ] <- The word to decompose
      • [int ] <- The top X category. Optional: Default 20.
      • [bool ] <- Save result into file. Optional: Default False.

Example:
python interpret_cli.py "data/glove/glove.6B.300d.txt" "data/semcat/Categories" -dense_file --lines_to_read=50000 -load_weights --calculate=decomp --calculation_args barrel 10 True

Related papers:

Semantic Structure and Interpretability of Word Embeddings

Related links