Fail to use it in Google Colab
ajason08 opened this issue · 3 comments
Hello,
Thank you for your effort at doing this python version.
I am struggling to run the first example.
My code (just 3 lines of code) can be reproduced with this notebook
Can you please help me to understand what is wrong?
Thank you!
Hi
L.19 in example.py only works in python 2.
Try replacing yield line.lower().translate(None, delchars).split(' ')
with yield line.lower().translate({ord(x): None for x in delchars}).split(' ')
You're probably going to run into more issues down the line though as this code was written for python 2 and appears to be no longer maintained.
Cheers
(The issue further down in your code is because model.fit() expects a list of lists, not a list of strings. Each document should be represented as a list of words.)
Now working as expected!
Thank you
I paste my solved code here for future references to readers.
!pip install glove_python
!curl -o my_corpus.txt https://norvig.com/big.txt
from glove import Corpus, Glove
#Creating a corpus object
corpus = Corpus()
""" The learner "model.fit()" expects a list of (list of string),
not a big string nor a list of strings.
Each document should be represented as a list of words: [[doc1],[doc2]...]
Next code will turn a txt file into this format.
However it should have more efficient alternatives """
with open("my_corpus.txt",'r') as f:
lines = f.read().split()
num_docs = 10
doc_list = []
last_index = 0
for i in range(num_docs):
upper_index= (int(len(lines)/num_docs))*(i+1) #probably lossing last lines
newdoc = lines[last_index:upper_index]
doc_list.append(newdoc)
print("number of docs in doc_list:",len(doc_list))
print("first doc fragment:", doc_list[0][0:11])
#Training the corpus to generate the co occurence matrix which is used in GloVe
corpus.fit(doc_list, window=10)
glove = Glove(no_components=5, learning_rate=0.05)
glove.fit(corpus.matrix, epochs=30, no_threads=1, verbose=True)
glove.add_dictionary(corpus.dictionary)
glove.save('glove.model')
glove = Glove.load('glove.model')
x = glove.most_similar("Sherlock", number=10)
x