find by xpath

Question

find by xpath

abacha opened this issue 11 years ago · 5 comments

is it possible to do something like that:

page = Upton::Scraper.new(url)
page.find_by_xpath("//body/div/a").value

Answer 1 · 2013-08-12T21:24:45.000Z

Hi @abacha,

Yes, Upton supports searching by XPath.

If you had an index page ( = a page with links you want to scrape), you could do something like this:

scraper = Upton::Scraper.new(url, "//body/div/a")
scraper.scrape do | instance_html, instance_url, instance_index|
   puts "The title of the page at #{instance_url} is #{Nokogiri::HTML(instance_html).title}"
end

Thanks to #11, you can use XPath or CSS selectors interchangeably.

Answer 2 · 2013-08-12T21:28:12.000Z

I wish I could do it in a simple way like I've demonstrated.. I need to do lots of searches through different xpath's in the same url

Answer 3 · 2013-08-12T21:31:39.000Z

Is the value of the content specified by the XPath expression another link to be scraped? Or just data you want to access?

And do you have lots of pages, or just one page to be scraped?

Answer 4 · 2013-08-12T22:05:21.000Z

If you just want to scrape lots of data from one page, just use Nokogiri. (Upton uses Nokogiri for HTML parsing.)

Nokogiri(Net::HTTP.get(URI(url)).xpath("//body/div/a").text

Answer 5 · 2013-08-17T22:55:40.000Z

Were you able to find a solution, @abacha?