Text Node Element support
phpdude opened this issue · 7 comments
Hi again,
The library doesn't support Text nodes completely. Text nodes must be supported because this is standard in layout work, so often bad guys don't use any HTML markup for data output. You can select this entries with example code:
Prefix(css='.user', children=[
Text(name="position", xpath='text()')
])
Result of this example code will be XPathExtractor
list with "_root"s of lxml.etree._ElementUnicodeResult
type and the type doesn't have xpath attribute, so it fails on xextract.extractors.lxml_extractor.XPathExtractor#select
validation check
if not hasattr(self._root, 'xpath'):
return XPathExtractorList([])
We must fix it :)
I ready to help if you know good way to support it :)
Can you please provide an HTML example and an output that you try to extract from it?
Of course :)
<p class="user">
<span> <span>English</span>, <span>Polish</span> </span><br>
Management,
Accountancy, invoices, Logistics, Marketing, Domestic forwarder, International forwarder, Sales, Company owner, Supplies, Management or governing body <br>Dyrektor Handlowy - właściciel
</p>
I want to extract Dyrektor Handlowy - właściciel
Try:
Element(xpath='//p[@class="user"]/text()')
Element
parser returns lxml
element, which in a case of text extraction is unicode.
Oh, lol. I missed it, I am sorry :)
You need to add this into Readme :)
Yeah it works, I tested it! Thanks :)
No problem :) I have updated README.