material_embedding: A Jupyter Notebook repository from HFooladi

Description

This is a repo for training a word2vec model on domain specific data (here chemistry and material science).

First, you shoud install all the required libraries.

pip install -r requirements.txt

The goal is to learn a sophisticated word embedding for domain specific data. Also, I have used Optuna for hyperparameter tuning. So, It autimaticall searches in the possible space and find the best hyperparameters for embedding.

Generally, There are three specific parts in this repo:

Preprocessing data (to transform the data to the appropriate format for gensim)
Word embedding from scratch
Funetuning pre-existing model.

You can learn more about how to run (and reproduce the results) by going through run.ipynb. Please let me know if you have any questions

HFooladi/material_embedding

Description