A web crawler to scrapping machine information data from Vultr Bare Metal and HostGator VPS Hosting sites
It is recommended to use a virtual environment to run the project (optional).
- Inside the project's root folder, install the project dependencies:
$ pip install -r requirements.txt
To execute this project, you need run the run.py file with at least one of the following arguments:
-h
or --help
: show the help message and exit
--print
: crawler the data and print to screen
--save_json
: crawler the data and save the results into json file (machine-webcrawler.json)
--save_csv
: crawler the data and save the results into csv file (machine-webcrawler.csv)
-
If you want crawler the data and print, run:
$ python run.py --print
-
If you want crawler the data and save into json file, run:
$ python run.py --save_json
-
If you want crawler the data and save into csv file, run:
$ python run.py --save_csv
Note: You can use two or more arguments together
-
If you want crawler the data, print and save into json file, run:
$ python run.py --print --save_json
-
If you want crawler the data, print and save into csv file, run:
$ python run.py --print --save_csv
-
If you want crawler the data, print, save into json file and save into csv file, run:
$ python run.py --print --save_json --save_csv
-
If you want crawler the data, save into json file and save into csv file, run:
$ python run.py --save_json --save_csv
To execute the tests, run:
$ python -m unittest -v
If you want to see the test coverage level, run:
$ coverage run -m unittest discover -s tests/ -v
$ coverage report
Note: Currently, this project has 99% code coverage
This project was developed in an environment with Ubuntu 20.04 and Python 3.8.10. But, you can run in any system with Python 3.8.10+