This exploration was motivated by the curiosity to gain a bird's eye understanding of the fundamental working of the Google search Engine's background via visualizations from a mathematical pov, some interesting facts were learned along the way. The reverse engineering approach taken by Rebbeca S.Wills served as the foundation for this understanding. Each of the functional attributes is cleverly achieved through elegant Linear Algebra manipulations of the base forms. The following are the highlights,
- Logical and Mathematical Analysis of the PageRank algorithm
- 0-vector fix, hopping fix, and PageRank score generation
- A simple code implementation to substantiate the observations
Graph representation is the conventional approach taken for web explorational tasks as they best capture their properties and serve as an interpretable intermediatory between the math and the actual web. I use the following simple graph as a basis for my exploration,
A dangling node represents a webpage that references no other webpage or that there are no links to continue to other webpages from it (in this case D is a dangling node). This can be identified from the mathematical notion as a row of zeros. The speculated solution to this issue is,
Hop refers to the action of a user who accesses any other random website not present in the at-time present chain of referenced websites. This notion is not very mathematically intuitive at first but the following solution clarifies it,
Overall, this exploration was extremely interesting to undertake and gave me new perspectives to the mathematical side of search engines and web applications in general.
The Documentation of this project can be found Here
The MATLAB code file of the corresponding matrices in report can be found Here
Prerequisites
- MATLAB (code was written in MATLAB 9.9 / R2020b) download here