word2id Vocabulary
yxtay opened this issue · 3 comments
Thank you for sharing your work.
Is there any way to get the word2id or id2word for the vocabulary? In jLDADMM, you had the script write out a .vocabulary
file. However, there is no corresponding output for this project.
I need the word2id as I am using the topic-word distribution to do some custom keyword scoring. Without the right vocabulary order, I have no idea which word does each column in the matrix refer to.
How would I be able to get the word2id in this case? After a quick exploration, I know it is definitely not following the word2id for the word embeddings file.
EDIT:
I realised that you had a function writeDictionary()
for writing the word2id to a file but it is not used in the write()
function. I think it will be a good idea to include it.
The same goes for the writeTopicVectors()
function. I believe users will benefit from having access to those. I have recompiled the jar file after making changes to include those and they are giving me expected outputs.
Hi, thanks for the comments. I will close this issue.
Hi @yxtay, years after I'm in the same position as I were. Thanks for showing the way out, however I cannot easily recompile the project due to lack of Java knowledge. Can you please share your source/jar with me? Best
Hi again, I think I managed to compile with a version that writes vocabulary as well. It's here if you'd like to check out. I do not commit here on the original repository as my java coding experience is only for 1 day long yet :)