/Relevance-Ranking-using-Latent-Semantic-Indexing--from-scratch-

Latent Semantic Analysis Introduction: An information retrieval technique patented in 1988. In the context of its application to information retrieval, it is sometimes called Latent Semantic Indexing (LSI). LSI allows a search engine to determine what a page is about outside of specifically matching search query text. It looks at “Themes” instead of “Keywords”. Linear Algebra techniques used in the project: Singular Value Decomposition, Cosine Similarity, Matrix properties. Dataset: “Sci.space” news group from 20 news groups dataset, available in the Scikit-Learn library. It contains 400 news articles related to space. SVD (Singular Value Decomposition): SVD is a matrix decomposition algorithm, it decomposes a matrix into 3 matrices which are a set to transformations. Decomposition leads to an orthogonal matrix U, Diagonal matrix S and a Diagonal Matrix V. This is the best possible transformation of a matrix. In this decomposition method we are looking for a set of orthonormal basis in the row space that when multiplied by the original matrix goes to an orthonormal basis in the column space.Av1 = σ1u1  Av2 = σ2u2 

Primary LanguageJupyter Notebook

Watchers