logstash-plugins/logstash-filter-xml

xml filter xpath silent failure

jordansissel opened this issue · 2 comments

(originally posted in elastic/logstash#1688 by @SleeperSmith)

The offending line is at line 93:
begin
doc = Nokogiri::XML(value)
rescue => e
event.tag("_xmlparsefailure")
@logger.warn("Trouble parsing xml", :source => @source, :value => value,
:exception => e, :backtrace => e.backtrace)
return
end

When Nokogiri fails, it does not throw, but instead it puts the error into an "errors" property. So you need to check for doc.errors for parse failures.

This especially problematic in situation where Nokogiri fails to parse but XmlSimple succeeds. Logstash would pump out the log with the structure expanded but none of the xpath would work obviously. (The exact problem i encountered.) The offending character in my case was etc with error message "#".

P.S. I don't do ruby, so I can't really do a bug fix and pull request.

I belive I bumped in to this issue (was quite frustrating).
When woring on https://discuss.elastic.co/t/xml-filter-help-required/1387 I tried that :
Event:

{"format":"xml_xpath","message":"<stats><stats xmlns='jcs:stats:jsm'><current-online-user-count>1730</current-online-user-count><login-rate>0</login-rate><successful_logins>93645</successful_logins><failed_logins>84583</failed_logins><uptime>1900999</uptime></stats>\n<stats xmlns='jcs:stats:delivery'><total-message-packets>5428196</total-message-packets><total-presence-packets>288328380</total-presence-packets><total-iq-packets>4977074</total-iq-packets><messages-in-last-time-slice>0</messages-in-last-time-slice><average-message-size>0</average-message-size></stats></stats>"}

Filters:

filter{
  if [format] == "xml_xpath" {
     xml {
          source => "message"
          target => "message_parsed"
          add_tag => ["xml_parsed"]
          xpath => [
            "/stats/stats/failed_logins/text()", "x_failed_logins"
            ]
     }
  }
}

Result : no error, no x_failed_logins entry.
When I had removed xmlns=.... params, The x_failed_logins appered.

I was able to test it here :
https://github.com/rafaltrojniak/logstash_rules/tree/xml_xpath

Please see the example inputs/outputs here. The first one(without xmlns) works, the second one (With xmlns) does not work
https://github.com/rafaltrojniak/logstash_rules/blob/xml_xpath/doc.md#example-sources

@rafaltrojniak your issue is different, because you use a namespace in inner element you should use a config to either

  1. remove all namespace prior executing the xpath =>
xml {
  source => "xmldata"
  target => "data"
  xpath => [ "/stats/stats/failed_logins/text()", "x_failed_logins" ]
  remove_namespaces => true
}
  1. register your namespace and use it in your xpath expression =>
xml {
  source => "xmldata"
  target => "data"
  namespaces => { "a" => "jcs:stats:jsm"}
  xpath => [ "/stats/a:stats/a:failed_logins/text()", "x_failed_logins" ]
}