Originally Submitted May 8, 2018 to Professor Shreya Kumar.
Click Here for the Project Presentation
This project has been slightly modified from the original; I have updated the formatting of this readme, renamed a few files, and made other small changes.
Dependencies used (Make sure you have these installed!):
- IPython
- Jupyter Notebook
- NetworkX
- Matplotlib
Thanks for checking out this project!
-Nick (nmarcopo)
Nick Marcopoli (nmarcopo)
Andy Shin (ashin1)
Austin Sura (asura)
Tina Wu (ywu6)
Nick and Andy mainly worked on the visualization part of the project: creating the graph and adding user-interaction features. Tina and Austin mainly worked on the python code that gets the weblinks from each wikipedia page.
Everyone contributed equally to the completion of this project.
https://www.youtube.com/watch?v=-WMY3JJ0s_I
To run the backend: ./wikiCrawler.py
or ./wikiCrawler.py -h
To run the visualization: open the program in Jupyter Notebook:
jupyter notebook --ip XXXXX --port XXXX --no-browser
We made a test script (timememory.py) to benchmark the runtime and memory usage of varying numers of links and layers.
A sample of output is shown below:
Wikipedia Article URL: https://en.wikipedia.org/wiki/Fortnite
#LINKS | #LAYERS | Time Usage | Memory Usage |
---|---|---|---|
1 | 1 | 1.008846 | 36.476562 |
2 | 1 | 1.318799 | 38.101562 |
3 | 1 | 1.509769 | 38.398438 |
4 | 1 | 1.751732 | 40.304688 |
5 | 1 | 2.036690 | 42.351562 |
1 | 2 | 1.308800 | 38.660156 |
2 | 2 | 2.553611 | 48.914062 |
3 | 2 | 3.889408 | 52.621094 |
4 | 2 | 5.636142 | 54.152344 |
5 | 2 | 7.903798 | 54.449219 |
1 | 3 | 1.542764 | 40.953125 |
2 | 3 | 4.384332 | 54.925781 |
3 | 3 | 11.133306 | 60.109375 |
4 | 3 | 24.119331 | 61.691406 |
5 | 3 | 45.060150 | 76.214844 |
1 | 4 | 1.743734 | 42.371094 |
2 | 4 | 9.182603 | 63.675781 |
3 | 4 | 39.776951 | 66.445312 |
4 | 4 | 112.903831 | 72.882812 |
5 | 4 | 259.600525 | 107.906250 |
To run the test script: ./timememory.py
As mentioned in our video, it is not possible to test the output every time because wikipedia pages are constantly changing. Therefore, we manually tested the outputs by clicking into the web pages and seeing what links are the most
relevant.
Since we used python and did not manually allocate memories, we did not have a
memory issue (we tested by running valgrind, but making a sperate script for that
did not seem necessary).