Text Node Element support

Question

Text Node Element support

phpdude opened this issue 8 years ago · 7 comments

Hi again,

The library doesn't support Text nodes completely. Text nodes must be supported because this is standard in layout work, so often bad guys don't use any HTML markup for data output. You can select this entries with example code:

Prefix(css='.user', children=[
    Text(name="position", xpath='text()')
])

Result of this example code will be XPathExtractor list with "_root"s of lxml.etree._ElementUnicodeResult type and the type doesn't have xpath attribute, so it fails on xextract.extractors.lxml_extractor.XPathExtractor#select validation check

        if not hasattr(self._root, 'xpath'):
            return XPathExtractorList([])

We must fix it :)

I ready to help if you know good way to support it :)

Answer 1 · 2016-07-16T15:05:40.000Z

Can you please provide an HTML example and an output that you try to extract from it?

Answer 2 · 2016-07-16T15:06:56.000Z

Of course :)

<p class="user">
                        <span>                <span>English</span>,                            <span>Polish</span>            </span><br>
                                                                                        Management,
                            Accountancy, invoices,                            Logistics,                            Marketing,                            Domestic forwarder,                            International forwarder,                            Sales,                            Company owner,                            Supplies,                            Management or governing body                                        <br>Dyrektor Handlowy - właściciel
                        </p>

Answer 3 · 2016-07-16T15:07:32.000Z

I want to extract Dyrektor Handlowy - właściciel

Answer 4 · 2016-07-16T15:17:44.000Z

Try:

Element(xpath='//p[@class="user"]/text()')

Element parser returns lxml element, which in a case of text extraction is unicode.

Answer 5 · 2016-07-16T15:19:23.000Z

Oh, lol. I missed it, I am sorry :)

You need to add this into Readme :)

Answer 6 · 2016-07-16T15:20:03.000Z

Yeah it works, I tested it! Thanks :)

Answer 7 · 2016-07-16T15:27:52.000Z

No problem :) I have updated README.