Inline DTD allows XML bomb attack
Closed this issue · 15 comments
This is Wiki page for the vulnerability, it is a very well known XML parser vulnerability: https://en.wikipedia.org/wiki/Billion_laughs_attack.
To replicate this issue in SweetXml, you can do the following in an iex session and watch in the observer:
4 hours later, and it's still running! Memory usage is slowly climbing past 900MB, scheduler 1 utilization hovers 70%.
I was looking into xmerl a bit to see if there's a way to disable inline DTD when using xpath before opening this issue, but I'm not familiar enough with it yet. Hoping someone else may be able to chime in. The closest thing I could find was in the release notes for xmerl 1.2.3 there's an option to turn off external DTD parsing. That sounds like that wouldn't solve this issue though, because internal DTD is the problem.
Maybe something can be done by changing the arguments passed to xmerl_xpath.string/3?
One suggestion to make this safer from @ellispritchard is to set the max heap size for the process calling xmerl functions: http://erlang.org/doc/man/erlang.html#process_flag_max_heap_size
Using 'xmerl_scan' on XML from an untrusted source is generally a bad idea anyway: it creates atoms from XML tag and attribute names, which can bring down the VM by exhausting atom table space.
Anyway, FWIW, it is possible to block DTD entity definitions altogether by using a custom set of rules, and raising on creation of an entity:
def parse([c | _] = doc, options) when is_integer(c) do
ets_table = :ets.new(:sweet_xml_rules, [:set, :public])
try do
{parsed_doc, _} = :xmerl_scan.string(doc, [no_dtd_entities_rules(ets_table) | options])
parsed_doc
after
:ets.delete(ets_table)
end
end
defp no_dtd_entities_rules(ets_table) do
{:rules, &no_dtd_entities_rules_read/3, &no_dtd_entities_rules_write/4, ets_table}
end
defp no_dtd_entities_rules_write(:entity, _name, _value, _state) do
raise "DTD entities not allowed"
end
defp no_dtd_entities_rules_write(context, name, value, state) do
ets_table = :xmerl_scan.rules_state(state)
case :ets.lookup(ets_table, {context, name}) do
[] ->
:ets.insert(ets_table, {{context, name}, value})
_ ->
:ok
end
state
end
defp no_dtd_entities_rules_read(context, name, state) do
ets_table = :xmerl_scan.rules_state(state)
case :ets.lookup(ets_table, {context, name}) do
[] ->
:undefined
[{_, value}] ->
value
end
end
Are there any mitigations for this?
hey, any update on this?
Has there been any progress on this?
Hi benbarber and everyone, we are discussing this internally. We'll come back to you next week.
Hi everyone, here are a few vectors that I gathered to help make a decision:
- SweetXml is first and foremost a small DSL intending to translate DOM based data into Elixir's structures, via xpath and its nesting.
- This means we want to keep it minimal.
- This means we don't want to do the parsing ourselves.
- Because of this, we are limited in the scope of what we can or are willing to change.
- This means the API is not tied to xmerl, altough the implementation currently is.
- Thanks to this, we could in the future support a different parser, and allow the user to choose wich one to use. But I personally don't know yet how to make the extra dependencies work with the goal of keeping the lib minimal.
- We want to keep backward compatibility as much as possible.
- This means we won't make the current API restrictive by default.
From all that, I'm deciding that:
- We will add options to
parse/2
,xpath/2
, so that the user can restrict the behavior of the parser. - We will update the doc so that it recommends heavily to set the options for more safety.
- We won't work on the atom exhaustion problem in the near future.
- I personally haven't investiguated
stream
andstream_tags
, but since it is based of:xmerl_scan
, the same changes will probably be applied to them.
On a last note, it is still possible to open an issue directly for xmerl to bring changes to it, on https://bugs.erlang.org/secure/Dashboard.jspa
I did not have much available time to work on it yet, but it's still present in my mind. I'll give an update next week, then try to give updates every two weeks at worst.
Hi everyone,
I've made a first implementation in issue-71, but it still need to be documented, real-world tested, and possibly iterated upon before I'm merging this in master
Hi everyone, I haven't had time to work on the doc yet, but I've set up a date to do it, I'll update you next week
Hi everyone, I'm practically done with the documentation. I haven't published a release candidate, but you can test it with:
{:sweet_xml, github: "kbrw/sweet_xml", tag: "v0.7.0-rc.1"},
I'll get a few teams to test it on our side as well
Hi, any news from testing?
Hi oliver-kriska, I haven't had much time to allocate on this recently, so unfortunately no. That made me think about it again though. I'll get in contact with a specific team and give an update here, by next friday, on whether I could bring the new version to their project
Hello everyone, the update is currently running on a forked xml flux with the option dtd: :none
. This is parsing files that don't use the advanced features of xml so all I can ensure is that the new option won't cause trouble when the parsed files are already safe. I'll try to see if one of our projects has some relevant flux that could make use of the other possible arguments to :dtd
, but that is quite unlikely.
In the meantime, I invite you all to try parsing some file with the different possible options if you have some ready at hand :)
Hey @jc00ke, the version 0.7.0 is indeed published to hex.pm. I didn't close the issue because I wanted it to act as a reminder for Kbrw to update our project to the new version, but I suppose an automated reminder would be a better way.
So yes, the issue should be resolved, and obviously if there are problems with the update feedbacks are welcome.