xpath is working properly?
Closed this issue · 3 comments
danieldocki commented
My Test
some_text xpath: '//*[@id="Content_Regs"]/table[1]/tbody/tr/td[2]/table/tbody/tr[4]/td[2]/text()[2]'
Return
{"some_text"=>nil}
Console Google Chrome
$x('//*[@id="Content_Regs"]/table[1]/tbody/tr/td[2]/table/tbody/tr[4]/td[2]/text()[2]')
["
São Paulo - SP"
]
What might be happening? forgot something?
felipecsl commented
Can you tell me what is the page you are trying to scrape so it helps me to digg down the problem?
danieldocki commented
Ok, no problem
require 'wombat'
class TelelistaScraper
include Wombat::Crawler
base_url "http://www.telelistas.net/br/restaurantes"
path "/?pagina=9"
some_text xpath: '//*[@id="Content_Regs"]/table[1]/tbody/tr/td[2]/table/tbody/tr[4]/td[2]/text()[2]'
end
puts TelelistaScraper.new.crawl
felipecsl commented
I suppose they are using some kind of javascript hack to avoid being scraped
1.9.3-p194 :018 > Nokogiri::HTML("http://www.telelistas.net/br/restaurantes/?pagina=9")
=> #<Nokogiri::HTML::Document:0x3ffeed8eb83c name="document" children=[#<Nokogiri::XML::DTD:0x3ffeed8f04b8 name="html">, #<Nokogiri::XML::Element:0x3ffeed8ef16c name="html" children=[#<Nokogiri::XML::Element:0x3ffeed8f2b3c name="body" children=[#<Nokogiri::XML::Element:0x3ffeed8f26dc name="p" children=[#<Nokogiri::XML::Text:0x3ffeed8f1f5c "http://www.telelistas.net/br/restaurantes/?pagina=9">]>]>]>]>
1.9.3-p194 :019 > html.inner_html
=> "<html><body><p>http://www.telelistas.net/br/restaurantes/?pagina=9</p></body></html>"