How to parse element attributes

Question

How to parse element attributes

Closed this issue 11 years ago · 4 comments

I want this script to return an array of hashes having [name, url]. But since the iterator returns what is INSIDE the a tag, I can't figure out how to get the info.

    require 'wombat'

    video_url = 'https://vimeo.com/26594942'
    result = Wombat.crawl do
      base_url video_url + "/likes"
      path "/"


      likers "css=.browse_people li a", :iterator do
        name "css=p.title"
        url "[href]", :html do |link|
          link
        end
      end

    end

    puts result

Answer 1 · 2013-12-03T20:19:27.000Z

you have to use xpath for that, something like url { xpath: './@href' } should work

Answer 2 · 2013-12-03T20:20:44.000Z

Here is an example of something I did in the past:

products "css=.list-view>li", :iterator do
  thumb({ xpath: ".//img/@src" })
  url({ xpath: ".//a[1]/@href" })
  details({ css: "h3.list-view-item-title a:first-child" }, :follow) do
  end
end

Answer 3 · 2013-12-03T20:20:53.000Z

Thanks, guess I will have to learn about xpath now!

Answer 4 · 2014-04-07T23:45:17.000Z

#29 attempts to add support for Nokogiri nodes for people like me, who prefer to work directly on the nodes rather than the combination of iterator and xpath. Might be useful for some other person, too.