==========================================================
Version 1.3.3 released
- Minor changes for better exception handling and graceful failure
Version 1.3.2 released
-
HTML5 video and audio tags are now supported
-
Relative URLs in the dictionary returned by pages_graph are now translated into absolute ones
Version 1.3 released
-
Avoid duplicate pages to be parsed again
-
PageParser constructor has been changed (now requires a second parameter, a reference to the crawler handler)
-
check_page_by_content method added to CrawlerHandler: checks if a page has been visited by examining its content
-
pages_graph now returns a dictionary with page's urls as keys
Version 1.2 released
-
Introduces the concept of depth in the crawling process, allowing to specify a predefined maximum depth for the web pages to crawl, in alternative or together with a limit for the number of pages retrieved. Consequently, the interface of start_crawling method has slightly changed (See documentation)
-
pages_graph method added to CrawlerHandler: it creates a summarizing graph-like object starting from any page
http://mlarocca.github.io/PyCrawler/
https://github.com/mlarocca/PyCrawler/tree/master/coverage_testing