/Jabba-Webkit

Jabba's headless webkit browser for scraping AJAX-powered webpages.

Primary LanguagePython

Jabba-Webkit

Jabba's headless webkit browser for scraping AJAX-powered webpages.

Usage:

jabba_webkit.py <url> [<time>]

url: the page whose source you want to get

time: The application will quit after this given time (in seconds)

If the webpage is AJAX-powered and updates itself, you can tell this browser to wait X seconds. Then it fetches the generated HTML source.

You can also use it as a library:

>>> import jabba_webkit as jw
>>> html1 = jw.get_page(url1, time1)
>>> html2 = jw.get_page(url2)    # yes, you can call it several times