Crawling in Python
Crawl the site, e.g. asokolsky.github.io, in order to identify imperfections, such as broken links.
Working with Virtual Environment
From primer:
- create it, if it is not there yet:
python3 -m venv .venv
or justmake venv
- activate it
source venv/bin/activate
- install requirements
python3 -m pip install -r requirements.txt
- install new packages if needed...
- freeze it
python3 -m pip freeze > requirements.txt
Scrapy
"fast high-level web crawling and web scraping framework"
Typing Verification
(venv) alex@L07A97UF:/mnt/c/Users/asoko/Projects/pycrawl$ mypy .
Usage
TLDR:
make run SITE=asokolsky.github.io
OR:
- create the venv:
make venv
- activate the venv:
source .venv/bin/activate
- use it:
python3 main.py -h
or:
python3 main.py -vv asokolsky.github.io