Gather real-world application benchmark suite results from SPEC HPG 2020 using a web-scraper. Store these results into our database on university server (yoda) so website can extract this information and have a sorting/filter system for users to view metrics.
The process by which data will be gathered(currently):
- Run ListLinks program to create notepad file with download links to all .csv for a given test suite.
- Run wget -i on the file with all of the links in the directory where we want every raw .csv to be place
- Run FoldAllCreator program in order to create a .sh file that is just “java -jar csvscraper1.4.jar” followed by all of the file names separated by spaces.
- Run the foldAll.sh file that we created, which runs the "FileFolder" program on all raw input.
- Now every .csv file is scraped, with a new output, and is ready to be fed to the database.
NOTE: In the future, these programs will exist in a cron job, and will have directory organization built in, so that they are able to constantly pull information and feed it into the database. Currently all the scripts are manually executed.
MongoDB database installed on yoda. Run python scripts to move scraped .csv files into the database.
- Pull the vip github since the webscraped data is stored on github.
- get the path to the new ouput csv files, change the path in each script based on the benchmarks, located in the importDB folder.
- run each script using "python <ScriptName.py>" once
- Now everthing is in the database.
Useful mongo commands
- mongo - acess the mongo shell to talk to databases
- show dbs - show all databases
- use "databaseName" - switch to a specific database
- show collections - show all collections within the current database
- db.collectionName.find().pretty() - display all documents in a specific collection
- exit - exit the mongo shell
NOTE: In the future, these programs will exist in a cron job, and will have directory organization built in, so that they are able to constantly updating database. Currently all the scripts are manually executed.
subject to change
Currently Yoda is housing the Frontend and the Backend. The Frontend is roughly defined as the React Website and the Backend as the Nodemon (looking to change this) website server on port 3000.
Advisors: Sunita Chandrasekaran, Rudolf Eigenmann, Mayara Gimenes,
Web-Scraping Team: Derek Baum, Ryan Emenheiser, Matt Benvenuto
Database Team: Max Luu, Jake Wise
User-Interface Team: Matthew Stack, Chris Munley