imsanjoykb/Natural-Language-Processing

Word Frequencies with TfidfVectorizer code

Opened this issue · 0 comments

from sklearn.feature_extraction.text import TfidfVectorizer

list of text documents

text = ["The quick brown fox jumped over the lazy dog.",
"The dog.",
"The fox"]

create the transform

vectorizer = TfidfVectorizer()

tokenize and build vocab

vectorizer.fit(text)

summarize

print(vectorizer.vocabulary_)
print(vectorizer.idf_)

encode document

vector = vectorizer.transform([text[0]])

summarize encoded vector

print(vector.shape)
print(vector.toarray())