/pycrawl

Web crawling using python

Primary LanguagePython

Crawling in Python

Crawl the site, e.g. asokolsky.github.io, in order to identify imperfections, such as broken links.

Working with Virtual Environment

From primer:

  • create it, if it is not there yet: python3 -m venv .venv or just make venv
  • activate it source venv/bin/activate
  • install requirements python3 -m pip install -r requirements.txt
  • install new packages if needed...
  • freeze it python3 -m pip freeze > requirements.txt

Scrapy

"fast high-level web crawling and web scraping framework"

Typing Verification

(venv) alex@L07A97UF:/mnt/c/Users/asoko/Projects/pycrawl$ mypy .

Usage

TLDR:

make run SITE=asokolsky.github.io

OR:

  • create the venv: make venv
  • activate the venv: source .venv/bin/activate
  • use it:
python3 main.py -h

or:

python3 main.py -vv asokolsky.github.io