dlab-berkeley/Data-Science-Social-Justice-2022

Kernel dying module 2 notebook 3

emilygrabowski opened this issue · 1 comments

I'm currently testing that all of the code runs in the datahub and with 8 GB of ram.

The kernel dies from too much memory being used when you run this line:
from sklearn.metrics.pairwise import cosine_similarity
similarities = cosine_similarity(tfidf)
similarities.shape

Although the biggest memory drain is a few cells before with the .to_dense() function

df = pd.DataFrame(tfidf.todense(), columns=tfidf_vectorizer.get_feature_names_out().ravel())

This is the line taking up the most memory.