gawel/pyquery

Is it OK to pass bytes in as the first arguement to PyQuery()?

Closed this issue · 3 comments

We have code that is sometimes calling pyquery.PyQuery("a string (str)") and is other times calling pyquery.PyQuery(b"a bytes string (bytes)"). Is this acceptable or should the first argument always be a str instead of bytes?

The question comes up because basestring is defined as (str, bytes) which is different than the usual definition of basestring which in Python 2 was defined as (str, unicode) and in Python 3 is usually defined as (str, ).

>>> from pyquery import PyQuery as pq
>>> pq("a").encoding, pq(b"a").encoding
('ISO-8859-1', 'UTF-8')
gawel commented

As I remember lxml deals with both cases

Agreed. Beyond the .encoding difference mentioned above, is there any other reason to prefer bytes over str?

gawel commented

As I remember lmxl detect the encoding based on the charset specified in the xml/html when you provides bytes. So you don't have to deal with this when you're scraping a large amount of web pages. I guess it's the only reason