This project uses Poetry as a package manager. Please refer to the Poetry Installation docs or just run the following to install:
curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/install-poetry.py | python -
Create a pandas dataframe with columns:
- word_label_text
- word_description_text
- word_concept_id
- word_label_id
- word_description_id
- score
Example when your words are in a text file, every word on its single line:
import pandas as pd
word_list = list()
with open('words') as wordlist_file:
for i, word in enumerate(wordlist_file.readlines()):
word_list.append([word.strip().lower(), '', i, i, i])
words_dataframe = pd.DataFrame(data=word_list,
columns=['word_label_text',
'word_description_text',
'word_concept_id',
'word_label_id',
'word_description_id'])
words_dataframe.to_pickle('wordlist.pickle.gzip', compression='gzip', protocol=5)
poetry run python3 ./run.py
Branch ab/experiment-memory-usage
- EMPTY data structure, english, 540k words: 3.3MiB
- FILLED data structure, english, 540k words: 273.63MiB ~= 530B per word