Quite often there is a need to collect pictures from one or another page on the Internet. This plugin solves this particular task.
sudo gem install image_downloader
-
ruby 1.8 or 1.9
-
gem nokogiri
Image Downloader is a rather simple library which does the following:
-
get web page (with Net::HTTP)
-
parse html page (use regexp or nokogiri)
-
download images (in one or multi-threads)
After installation, you can use the following code as an example:
require 'rubygems' require 'image_downloader' page_url = 'www.test.com' target_path = 'img_dir/' downloader = ImageDownloader::Process.new(page_url,target_path) ##### # download all images on page in any place (by regexp, all that look like url with image) downloader.parse(:any_looks_like_image => true) ##### or # download images from all elements where usually images placed (<img...>, <a...>, ...) downloader.parse() ##### or # download image from exect places in page downloader.parse(:collect => {:link_icon => true}) ##### or # download images by regexp downloader.parse(:regexp => /[^'"]+\.jpg/i) downloader.download()
For “parse” method available following options
# find all url which contain image extansion :any_looks_like_image => true # find images in specified location :collect => { :all => true, # all image places :(img_src|a_href|style_url|link_icon) => true # specified location } # find by regexp :regexp => /['"]([^'"]+\.jpg)[^'"]*['"]/i) # for ruby 1.8 (in 1.9 not allowed () for scan method) :regexp => /[^'"]+\.jpg/i # the same, but shorter :regexp => /[^'"]+\.css/ # other files can also be downloaded # ignore URLs with images according to given parameters :ignore_without => {:(extension|image_extension) => true} # setting the favorite User-Agent (vary important for exclude 403, 404... responses from server) :user_agent => "ruby" # Mozilla/5.0 by default
Detailed location description
-
img_src - tag: img, attribute: src=“url”
-
a_href - tag: a, attribute: href=“url”
-
style_url - tag: any, attribute: style=“(background|background-image): url(‘url’)”
-
link_icon - tag: link, attribute: rel=“shortcut icon” href=“url”
For “download” method you can use following directives
:parallel => true # for multi thread downloading (this is default if no options) :consequentially => true, # for sequential downloading into a single stream :user_agent => "ruby" # Mozilla/5.0 by default
You can simply use the executed shell commands:
For any looks like image download
download_any_images url dir/
For download favicon only
download_icon url dir/
For download all, that is located in the places for pictures
download_images url dir/
For download by regexp
download_by_regexp url dir/ "[^'\"]+\\.js"
“-d”, “–debug”
To monitor the process of downloading, use the -d flag in the parameters. Perhaps there is an error URI::InvalidURIError in some cases.
download_images url dir/ -d
Copyright © 2011 Malykh Oleg. See LICENSE.txt for further details.
The MIT License
Personal blog author: Malykh Oleg - blog in russian