TUM-Dev/eat-api

Black recomends replacing lxml with defusedxml

CommanderStorm opened this issue · 4 comments

event though we kind of trust the stuwerk, it is better to be save than sorry

Issue: [B410:blacklist] Using html to parse untrusted XML data is known to be vulnerable to XML attacks. Replace html with the equivalent defusedxml package.
   Severity: Low   Confidence: High
   Location: src/menu_parser.py:19
   More Info: https://bandit.readthedocs.io/en/latest/blacklists/blacklist_imports.html#b410-import-lxml
18      import requests
19      from lxml import html

Is this still relevant?
I looked into it a bit, and noticed, that the lxml package of defusedxml (as recommended in the comment) is deprecated. https://pypi.org/project/defusedxml/#defusedxml-lxml

So as far as I see it, a replacement would require to rewrite the parser?
Or is there another option?

tiran/defusedxml#38 states, that the package is no longer nessesary, but will be availibele until a real fix is found, so probably still worth to keep subscribed on this issue.

the lxml-FAQ has some recomendations, on how to avert the size/compute-blowup vulnerabilities, but does not seem to specific on how to secure the html parsing ability we use from this library.
https://lxml.de/FAQ.html#how-do-i-use-lxml-safely-as-a-web-service-endpoint

I have not researched, if the html parsing of lxml is also vulnerable to these kinds of attacks.