/tumblr-scraper

Scrape a Tumblr blog for personal posts

Primary LanguageJavaScript

Tumblr Personal Post Scraper

Scrape user uploaded content from a Tumblr blog. Tumblr provides no natural mechanism for viewing user uploaded content on their website.

example scrape https://support.tumblr.com: https://gyazo.com/9eb0825ddca040f8467838ca519029e9

New In V1.2.0

  • Request Throttling
  • Single file executables including installers (DMG, NSIS)
  • Faster UI (React PureComponent and Redux Integration)
  • New Icon
  • Smaller JavaScript bundle

Download V1.2

Contribution

Clone the rep:

git clone https://github.com/lluisrojass/tumblr-scraper.git
cd tumblr-scraper
npm install

Run npm run watch to execute a development watchify script which monitors files and re-builds the bundle file upon noticing a change. The bundle file will not be present upon cloning and will require generation regardless. The other method for bundle file generation is running npm run min which ouputs a minified and production ready bundle. While in development, use the npm run simulate command to simulate the app with development addons (chrome devtools and electron-reload) which are useful for logging and debugging.

Tools/Libraries to be aware of:

What is Request Throttling?

When scraping blogs with large frequency and density of original posts the application could become unresponsive or a significant CPU burden. To help alleviate this possibility, throttling was introduced. When turned on (which is the default behavior) the application will keep track of the pending image load which the application has yet to fulfill and could temporarily delay the continuation of the requests loop. This provides breathing time between page requests which prevent a potentially overwhelming rush of front-end workload. While all other application state (blogname, post types) has to be pre-set before a scrape can begin, throttling can be turned on/off anytime.

Like what you see? consider favoriting or following the project :)

License

MIT