/Semantic-Graph-of-Wiki-Pages

Create a program to collect at least 500 Wikipedia pages and links from these pages to other Wikipedia pages. Collect similarity data for each page using word frequencies and/or word2vec. Report the number of spanning trees (for any arbitrary node as initial starting point) as a connectivity check. Utilize caching to avoid unnecessary reconstruction. Then write a UI to read the graph, allowing the using to select any two pages (by title) and display graphically the shortest (weighted by any similarity metric) path between them, if one exists, as well as the most similar node for each.

Primary LanguageJavaMIT LicenseMIT

Semantic-Graph-of-Wiki-Pages

Create a program to collect at least 500 Wikipedia pages and links from these pages to other Wikipedia pages. Collect similarity data for each page using word frequencies and/or word2vec. Report the number of spanning trees (for any arbitrary node as initial starting point) as a connectivity check. Utilize caching to avoid unnecessary reconstruction. Then write a UI to read the graph, allowing the using to select any two pages (by title) and display graphically the shortest (weighted by any similarity metric) path between them, if one exists, as well as the most similar node for each.

Instructions After Downloading

---> gradlew run

Project Decomposition

Task Number Task Title COMPLETED
1 Collect Data From Wiki-Pages X
2 Collect Similarity Data (Word Frequencies And/Or Word2vec) X
3 Construct Weighted Graph X
4 Persist The Graph X
5 Visualize The Graph X
6 Report The Number Of Spanning Trees X
7 Perform Dijkstra's Shortest Path Algorithm W/ A Priority Queue X

Latest Stable Copy

Class Project Zip