Task description

implement recursive web-crawler of the site.
crawler is a command-line tool that accept starting URL and destination directory
crawler download the initial URL and look to links inside the original document (recursively)
crawler does not walk to link outside initial url (if starting link is https://start.url/abc, then it goes to https://start.url/abc/123 and https://start.url/abc/456, but skip https://another.domain/ and https://start.url/def)
crawler should correctly process Ctrl+C hotkey
crawler should be parallel
crawler should support continue to load if the destination directory already has loaded data (if we cancel the download and then continue).

Timeframe for the task

4 hours.

Note

In general, it's very similar to "wget --mirror" (with few extra options). Does not necessary to implement all listed above, just do some (more important from your point of view) part, but take to account to do it in the way where every listed above is possible to add by extending your program (without re-writing from scratch)".

How to run

Install Python 3
Open source code directory in terminal.
Create virtual env with:
```
python3 -m venv virtualenv
```
Activate VirtualEnv with:
```
source ./virtualenv/bin/activate
```
Install app in editable mode:
```
pip install -e .
```
Run app, as test target https://crawler-test.com/ can be used:
```
krawl https://crawler-test.com outdir
```

manul7/krawler

Task description

Timeframe for the task

Note

How to run