felipecsl/wombat

How to parse element attributes

Closed this issue · 4 comments

I want this script to return an array of hashes having [name, url]. But since the iterator returns what is INSIDE the a tag, I can't figure out how to get the info.

    require 'wombat'

    video_url = 'https://vimeo.com/26594942'
    result = Wombat.crawl do
      base_url video_url + "/likes"
      path "/"


      likers "css=.browse_people li a", :iterator do
        name "css=p.title"
        url "[href]", :html do |link|
          link
        end
      end

    end

    puts result

you have to use xpath for that, something like url { xpath: './@href' } should work

Here is an example of something I did in the past:

products "css=.list-view>li", :iterator do
  thumb({ xpath: ".//img/@src" })
  url({ xpath: ".//a[1]/@href" })
  details({ css: "h3.list-view-item-title a:first-child" }, :follow) do
  end
end

Thanks, guess I will have to learn about xpath now!

#29 attempts to add support for Nokogiri nodes for people like me, who prefer to work directly on the nodes rather than the combination of iterator and xpath. Might be useful for some other person, too.