how can i create word set (f) of figure 5 in paper
Opened this issue · 0 comments
Hi there,
I was deeply impressed after reading your excellent paper.
I have a question regarding the creation of the wordset in Figure 5.
To generate the wordset, I utilized word_embedding (size: (32000, 4096)) and source_embedding (size: (num_tokens, 4096)). Using FAISS, I performed similarity comparisons by comparing a single source (1, 4096) with all word_embedding entries, extracting only the top 10 most similar words. This process was repeated for all num_tokens.
After training for more than 100 epochs and comparing the wordsets across all epochs, I noticed that the extracted wordsets mostly consist of meaningless special characters and words. Could you kindly explain how the wordset was extracted in your work?
Thank you.