Is it OK to pass bytes in as the first arguement to PyQuery()?
Closed this issue · 3 comments
We have code that is sometimes calling pyquery.PyQuery("a string (str)")
and is other times calling pyquery.PyQuery(b"a bytes string (bytes)")
. Is this acceptable or should the first argument always be a str
instead of bytes
?
The question comes up because basestring
is defined as (str, bytes)
which is different than the usual definition of basestring
which in Python 2 was defined as (str, unicode)
and in Python 3 is usually defined as (str, )
.
>>> from pyquery import PyQuery as pq
>>> pq("a").encoding, pq(b"a").encoding
('ISO-8859-1', 'UTF-8')
As I remember lxml deals with both cases
Agreed. Beyond the .encoding
difference mentioned above, is there any other reason to prefer bytes
over str
?
As I remember lmxl detect the encoding based on the charset specified in the xml/html when you provides bytes. So you don't have to deal with this when you're scraping a large amount of web pages. I guess it's the only reason