propublica/upton
A batteries-included framework for easy web-scraping. Just add CSS! (Or do more.)
HTMLMIT
Issues
- 2
New version?
#40 opened by nofxx - 0
- 0
make scrape method return an enumerator
#38 opened by jeremybmerrill - 1
problem scraping index page (Scraping 0 instances)
#36 opened by okliv - 3
Pagination always double-downloads first page
#37 opened by jaypinho - 1
Make Scraper instances additive
#35 opened by jeremybmerrill - 1
Create ScrapedPage object
#32 opened by jeremybmerrill - 3
- 5
- 1
HTML Comment on stashed pages with info
#33 opened by jeremybmerrill - 20
Refactor API
#5 opened by adelevie - 2
The example in README.md does not work
#29 opened by paos - 2
Switch from concatenating HTML to putting it in an array when paginating
#25 opened by jeremybmerrill - 2
Handle pagination out-of-the-box
#17 opened by bxjx - 7
pagination doesn't respect sleep time
#28 opened by jeremybmerrill - 5
Recursive function causing a stack overflow
#23 opened by esagara - 0
Warn users of slug collisions
#27 opened by jeremybmerrill - 4
Use content-type to skip non-HTML instance pages
#22 opened by swapab - 7
Improving url_to_filename
#20 opened by dannguyen - 7
Downloading and Caching part
#10 opened by kgrz - 5
find by xpath
#18 opened by abacha - 15
- 4
relative url edge cases
#16 opened by jeremybmerrill - 2
relative URLs
#8 opened by jeremybmerrill