"doc = pq(driver.page_source)" can not find elements
Opened this issue · 1 comments
iamhmx commented
when i use selenium get the "page_source", and find the elements by pyquery, not work; but when i use "doc = pq(url='https://xxxxx')" directly, it works well. codes below:
part one:
from pyquery import PyQuery as pq
doc = pq(url='https://search.jd.com/Search?keyword=%E7%A9%BA%E6%B0%94%E5%87%80%E5%8C%96%E5%99%A8&enc=utf-8&suggest=1.def.0.V18&wq=kongqijingh&pvid=60c4120a5787482e8337c64c2fd4184d')
for item in doc('.gl-i-wrap').items():
price = item('.p-price strong i').text()
print('price:', price)
works well!
part two:
html = self.driver.page_source
doc = pq(html)
for item in doc('.gl-i-wrap').items():
price = item('.p-price strong i').text()
print('price:', price)
not work!
Saren-Arterius commented
This issue affects me too. Try print the first 200 characters of page_source
, then remove the attribute of <html>
. In my case, I have to do this for CSS selectors to work while I am scrapping Facebook WAP.
html = b.page_source.replace('<html xmlns="http://www.w3.org/1999/xhtml">', '<html>')
doc = pq(html)