Inline DTD allows XML bomb attack

Question

Inline DTD allows XML bomb attack

Closed this issue 3 years ago · 15 comments

This is Wiki page for the vulnerability, it is a very well known XML parser vulnerability: https://en.wikipedia.org/wiki/Billion_laughs_attack.

To replicate this issue in SweetXml, you can do the following in an iex session and watch in the observer:

4 hours later, and it's still running! Memory usage is slowly climbing past 900MB, scheduler 1 utilization hovers 70%.

I was looking into xmerl a bit to see if there's a way to disable inline DTD when using xpath before opening this issue, but I'm not familiar enough with it yet. Hoping someone else may be able to chime in. The closest thing I could find was in the release notes for xmerl 1.2.3 there's an option to turn off external DTD parsing. That sounds like that wouldn't solve this issue though, because internal DTD is the problem.

Maybe something can be done by changing the arguments passed to xmerl_xpath.string/3?

One suggestion to make this safer from @ellispritchard is to set the max heap size for the process calling xmerl functions: http://erlang.org/doc/man/erlang.html#process_flag_max_heap_size

Answer 1 · 2019-11-10T10:57:05.000Z

Using 'xmerl_scan' on XML from an untrusted source is generally a bad idea anyway: it creates atoms from XML tag and attribute names, which can bring down the VM by exhausting atom table space.

Anyway, FWIW, it is possible to block DTD entity definitions altogether by using a custom set of rules, and raising on creation of an entity:

  def parse([c | _] = doc, options) when is_integer(c) do
    ets_table = :ets.new(:sweet_xml_rules, [:set, :public])

    try do
      {parsed_doc, _} = :xmerl_scan.string(doc, [no_dtd_entities_rules(ets_table) | options])
      parsed_doc
    after
      :ets.delete(ets_table)
    end
  end

  defp no_dtd_entities_rules(ets_table) do
    {:rules, &no_dtd_entities_rules_read/3, &no_dtd_entities_rules_write/4, ets_table}
  end

  defp no_dtd_entities_rules_write(:entity, _name, _value, _state) do
    raise "DTD entities not allowed"
  end

  defp no_dtd_entities_rules_write(context, name, value, state) do
    ets_table = :xmerl_scan.rules_state(state)

    case :ets.lookup(ets_table, {context, name}) do
      [] ->
        :ets.insert(ets_table, {{context, name}, value})

      _ ->
        :ok
    end

    state
  end

  defp no_dtd_entities_rules_read(context, name, state) do
    ets_table = :xmerl_scan.rules_state(state)

    case :ets.lookup(ets_table, {context, name}) do
      [] ->
        :undefined

      [{_, value}] ->
        value
    end
  end

Answer 2 · 2020-04-01T21:56:59.000Z

Are there any mitigations for this?

Answer 3 · 2020-11-14T00:30:07.000Z

hey, any update on this?

Answer 4 · 2021-01-21T15:23:43.000Z

Has there been any progress on this?

Answer 5 · 2021-01-22T10:42:06.000Z

Hi benbarber and everyone, we are discussing this internally. We'll come back to you next week.

Answer 6 · 2021-01-29T11:33:37.000Z

Hi everyone, here are a few vectors that I gathered to help make a decision:

SweetXml is first and foremost a small DSL intending to translate DOM based data into Elixir's structures, via xpath and its nesting.
- This means we want to keep it minimal.
- This means we don't want to do the parsing ourselves.
  - Because of this, we are limited in the scope of what we can or are willing to change.
- This means the API is not tied to xmerl, altough the implementation currently is.
  - Thanks to this, we could in the future support a different parser, and allow the user to choose wich one to use. But I personally don't know yet how to make the extra dependencies work with the goal of keeping the lib minimal.
We want to keep backward compatibility as much as possible.
- This means we won't make the current API restrictive by default.

From all that, I'm deciding that:

We will add options to parse/2, xpath/2, so that the user can restrict the behavior of the parser.
We will update the doc so that it recommends heavily to set the options for more safety.
We won't work on the atom exhaustion problem in the near future.
I personally haven't investiguated stream and stream_tags, but since it is based of :xmerl_scan, the same changes will probably be applied to them.

On a last note, it is still possible to open an issue directly for xmerl to bring changes to it, on https://bugs.erlang.org/secure/Dashboard.jspa

Answer 7 · 2021-02-26T10:43:19.000Z

I did not have much available time to work on it yet, but it's still present in my mind. I'll give an update next week, then try to give updates every two weeks at worst.

Answer 8 · 2021-03-05T11:09:17.000Z

Hi everyone,

I've made a first implementation in issue-71, but it still need to be documented, real-world tested, and possibly iterated upon before I'm merging this in master

Answer 9 · 2021-03-19T10:18:01.000Z

Hi everyone, I haven't had time to work on the doc yet, but I've set up a date to do it, I'll update you next week

Answer 10 · 2021-03-25T15:58:30.000Z

Hi everyone, I'm practically done with the documentation. I haven't published a release candidate, but you can test it with:

{:sweet_xml, github: "kbrw/sweet_xml", tag: "v0.7.0-rc.1"},

I'll get a few teams to test it on our side as well

Answer 11 · 2021-04-29T07:14:57.000Z

Hi, any news from testing?

Answer 12 · 2021-04-30T08:12:20.000Z

Hi oliver-kriska, I haven't had much time to allocate on this recently, so unfortunately no. That made me think about it again though. I'll get in contact with a specific team and give an update here, by next friday, on whether I could bring the new version to their project

Answer 13 · 2021-05-04T13:18:21.000Z

Hello everyone, the update is currently running on a forked xml flux with the option dtd: :none. This is parsing files that don't use the advanced features of xml so all I can ensure is that the new option won't cause trouble when the parsed files are already safe. I'll try to see if one of our projects has some relevant flux that could make use of the other possible arguments to :dtd, but that is quite unlikely.
In the meantime, I invite you all to try parsing some file with the different possible options if you have some ready at hand :)

Answer 14 · 2021-08-19T00:02:36.000Z

I'm unclear if this should be closed by 088b3be?

Answer 15 · 2021-08-19T09:45:51.000Z

Hey @jc00ke, the version 0.7.0 is indeed published to hex.pm. I didn't close the issue because I wanted it to act as a reminder for Kbrw to update our project to the new version, but I suppose an automated reminder would be a better way.

So yes, the issue should be resolved, and obviously if there are problems with the update feedbacks are welcome.