This is a simple crawler that crawls webpages according to the regex provided, starting from the given url, and crawls till the max depth given. It uses the new async/await coroutines introduced in PEP 492.
- create a network visualization with the data saved
- convert mongodb operations to bulk update
These tests were run on a free tier AWS EC2 server with this starting url.
Current results :
- Time Taken for 494 requests(recursion level 1) : 5.484668092802167 sec
- Time Taken for 36997 requests(recursion level 2) : 415.45510824956 sec