Modifying previous version
Opened this issue · 2 comments
Hi,
I'm trying to modify version 1.1.8 of this gem and am running into some issues with the test suite. I'm thinking it might could be that I'm not using the expected versions of the dependency gems.
One of the errors I'm most concerned about is:
- Failure:
test_attribute_hash_access(TestOfNokogiriDriver) [tests/xml_query_front_test.rb:183]:
<"bl\xC3\xA5b\xC3\xA6rgr\xC3\xB8d"> expected but was
<"blåbærgrød">.
Note that I have only modified one line of code for this gem that handles how it deals with parsing errors for NokoGiri. Any advice on how to set up the test environment would be greatly appreciated. Thanks!
- Steven
Hi Steven,
That's some sort of encoding error. Does this happen for all xml drivers or only for the Nokogiri one?
It's been a while since I used this, but I just ran the test case now and I get a similar (albeit slightly different) result:
33) Failure:
test_attribute_hash_access(TestOfREXMLDriver) [tests/xml_query_front_test.rb:183]:
<"bl\xC3\xA5b\xC3\xA6rgr\xC3\xB8d"> expected but was
<"bl\u00E5b\u00E6rgr\u00F8d">.
And when I look at the actual test case, I'm not sure why I would expect the parser to return utf-8 encoded strings, rather than unicode. Maybe this was written prior to Ruby getting proper unicode support (I think this is a 1.9.x thing?). In any case, I think this is safe to ignore. The behaviour you are seeing appears to be correct, with the test case being the broken part here.
What are you changing and why an older version?
Thanks for the quick reply!
So it looks like your results here may be different also because that's the REXML driver test. There are errors with all drivers, but I only really care about nokogiri. So when I use nokogiri to parse real xml data the results I get just strip out the bad UTF-8 characters (unparsable xml entities).
I'm changing the line in xml_query_font::parse_string to ignore errors on the parsed doc:
raise ParseError.new unless (doc && doc.root)
I'm doing this so that the parser will just throw out the bad chars and return the sanitized doc. I'm using an old version because it's what is listed as the dependent version for jiraSOAP.