Selectolax hangs because of bad CSS selector
Thewildweb opened this issue ยท 12 comments
Hi,
first thanks for selectolax, it helps me greatly.
If I give selectolax a bad css selector like "span[itemprop='example" it hangs indefinitely. It would be great if that will raise an Exception.
If you need some help with the project, I could see if I can free some time
Hello @Thewildweb !
Just curious. I tried to reproduce the issue you had by doing:
#!/usr/bin/python3
# -*- coding: utf-8 -*-
import requests
from selectolax.parser import HTMLParser
response = requests.get("https://edition.cnn.com/")
bs4 = HTMLParser(response.text)
test= bs4.css_first("span[itemprop='example")
print(test)
and the return was:
File "selectolax\parser.pyx", line 101, in selectolax.parser.HTMLParser.css_first
File "selectolax\node.pxi", line 458, in selectolax.parser.Node.css_first
File "selectolax\node.pxi", line 441, in selectolax.parser.Node.css
File "selectolax\selector.pxi", line 16, in selectolax.parser.Selector.__init__
File "selectolax\selector.pxi", line 57, in selectolax.parser.Selector._prepare_selector
ValueError: Bad CSS Selectors: span[itemprop='example
Also if you actually do write a correct example:
bs4.css_first("span[itemprop='example'")
None
Im not quite sure how you were able to produce this bug but it is very intersting for me to know as I might had a similar issue. Wasn't sure if its related to the same thing however.
Hi, sorry for the late reply. Here is an exaple.
import requests
from selectolax.parser import HTMLParser
resp = requests.get("https://www.python.org/")
tree = HTMLParser(resp.text)
# bad css selector 'href' between quotes
a_hrefs = tree.css("a['href']")
# now it hangs indefinitly
Hi, sorry for the late reply. Here is an exaple.
import requests from selectolax.parser import HTMLParser resp = requests.get("https://www.python.org/") tree = HTMLParser(resp.text) # bad css selector 'href' between quotes a_hrefs = tree.css("a['href']") # now it hangs indefinitly
I see! That's probably due to incorrect css selector. Could agree that it is not good and should trigger a exception, there is a way to avoid that by doing
a_hrefs = tree.css('a[href^=""')
even though im not expert with css selectors :D
We need to fix this in Modest.
It hangs when parsing a CSS selector.
https://github.com/lexbor/lexbor - I assume Modest wont be prio anymore? Seems like the dev is working on something new. Have you seen it?
https://github.com/lexbor/lexbor - I assume Modest wont be prio anymore? Seems like the dev is working on something new. Have you seen it?
It's a faster engine, but it does not support everything that we need yet. For example, CSS engine can only parse queries. It can't execute them yet.
Does it only happen when using CSS selector?
It hangs on some malformed examples, but I don't know why this happens. I'm not very familiar with the source code of Modest.
https://github.com/lexbor/lexbor - I assume Modest wont be prio anymore? Seems like the dev is working on something new. Have you seen it?
It's a faster engine, but it does not support everything that we need yet. For example, CSS engine can only parse queries. It can't execute them yet.
Does it only happen when using CSS selector?
It hangs on some malformed examples, but I don't know why this happens. I'm not very familiar with the source code of Modest.
Thank you for the information :) I see. Maybe soon it will support it but even though this is already pretty fast so excited to see if it can be even faster;
It is not a huge issue. I thought it would be nice to throw an exception if it was an easy job.
A, even faster parser would be awesome. I'm coming from bs4, so selectolax feels instant...
Hi,
I can deal with this tomorrow lexborisov/Modest#84.
Seems to have fixed in Modest.
@lexborisov Thanks!
https://github.com/lexbor/lexbor - I assume Modest wont be prio anymore? Seems like the dev is working on something new. Have you seen it?
It's a faster engine, but it does not support everything that we need yet. For example, CSS engine can only parse queries. It can't execute them yet.
Does it only happen when using CSS selector?
It hangs on some malformed examples, but I don't know why this happens. I'm not very familiar with the source code of Modest.
Hi man! Just got a comment about new update!
Is that something you plan to add to selectolax?