floydpink/lzwCompress.js

Can't handle strings

Jacob-Christian-Munch-Andersen opened this issue ยท 13 comments

Numerous test cases fail, either by a hard error, or by returning something that isn't the original string. Simplest failing test case:

var compressed = lzwCompress.pack('""');
var original = lzwCompress.unpack(compressed);

I'd love to use this in a project I'm working on, and it seemed to work OK in my testing, but this issues scares the bejeezus out of me. I see there have been quite a few commits since then, but from the comments I can't tell whether they address the issue. Is there any update?

Thanks.

@fds-github: Pull requests are welcome if you see this as a scary issue and could fix it. ๐Ÿ˜„

It states:

Numerous test cases fail, either by a hard error, or by returning something that isn't the original string.

But has given only one example which is an empty string.

In my reading, this is a non-issue and I do not plan to fix it.

Pull requests are welcome if you see this as a scary issue and could fix it. ๐Ÿ˜„

Ah, if I had a clue, I'd offer it. ๐Ÿ˜„

The problem is that users can't pass the code an arbitrary object and expect that it encodes and decodes correctly. Minimally they have to know the rules for which things break it. If it was only this example that caused problems, it wouldn't be a big deal (and would be trivial to fix). But without knowing the root cause, one can't tell what other inputs might be a problem, and so can't use the code and be assured that it will work. The "" test case would be a good place to start debug, though.

I thought I was seeing examples of the "silent failure", but these turned out to be false alarms:

  • As a shortcut for object comparison I was comparing the output of JSON.stringify on the original and packed/unpacked object, and was seeing differences. Investigation showed that the properties were in different orders in the two JSON strings, which is not an issue.
  • I then used the deepCompare function from here: http://stackoverflow.com/questions/1068834/object-comparison-in-javascript This also showed differences, but this time the problem was that lzwCompress.unpack adds the properties inArray and pushNew to the object. IMO this isn't great, but also isn't significantly changing the object.

I will try to contact the OP and ask him to revisit this issue to let us know whether he remembers whether what he was seeing could be explained by the mistakes I was originally making, or whether he has other test cases that could point out the problems.

I understand if this isn't a priority for you. If it becomes one for me, I'll try to "get a clue" and let you know if I find anything.

But has given only one example which is an empty string.

It is not an empty string, it is a string containing two double quotes. As best I can tell you can put anything between those quotes and it still fails.

In my reading, this is a non-issue and I do not plan to fix it.

That is not an attitude I can work with as a professional who has obligations to my own customers. I don't know how many cases fail, you seemingly don't either.

It is not an empty string, it is a string containing two double quotes. As best I can tell you can put anything between those quotes and it still fails.

Thanks for pointing out other failure cases. But do you have any examples of failures "returning something that isn't the original string"? These are more insidious, and it would be good to know whether a hypothetical fix to the double-quote problem fixes the other problems you were seeing.

That is not an attitude I can work with as a professional who has obligations to my own customers. I don't know how many cases fail, you seemingly don't either.

While I appreciate what you're saying and agree with your point (see my original comment), it comes across as a bit harsh, considering that the author is publishing the code with an MIT license.

I had to dig a bit, but here is a string that changes content: '"{\"abnabnabn\":\"asd\"}"'

As for the harsh comment, do you consider open source code to be inferior quality? Do you automatically deserve a buggy product just because you didn't pay for it? If so then you are confirming every piece of fud about open source code. If we don't demand the same from open source software as we do from closed source, then we are making the myth that open source software is inferior true. When I looked for an open source JavaScript compression library I found around 10 different offerings, all of them had flaws that made them useless to me, and most had flaws that in my opinion made them useless to everyone, but it took a good deal of time to confirm those issues. I put up this report mostly to help others not waste their time on a broken library, I didn't have a lot of faith that the issue would actually be resolved, and guess what the status is 1ยฝ years later.

Thanks very much for the digging to find that example. But for me on Chrome it seems to work:

image

(Wasn't sure whether the outer single-quotes were part of the example or not, so I tried it both ways.)

I looked into the double-quote issue a very little and I believe that the problem is related to performing an unnecessary JSON.parse on the result. It seems suspicious to me that the code tries to JSON.parse the result of unpacking; I'd think that we'd know whether this needs to be done or not, but to be honest, I got a bit lost in following the call chain. As an experiment I tried commenting out that call and it fixes the double-quote problem and still works with the meager set of test cases I had, but fails with the example you give here, and probably many more as it was just a shot in the dark. If I get more time I'll try to take a look, but I'm sure that floydpink would be more qualified to investigate.

As for the quality of open source: I agree, it is very often as good as or better than software you pay for, and software needs to be good quality to be useful. But if I've asked someone to fix my car and it didn't work, I'd be more reserved in telling my buddy who did it for free about the problem than I would in telling the garage I paid. ๐Ÿ˜„

In case anyone stumbles upon this thread and can't accept the limitations of the existing code, Jacob tells me that he tweaked another library he found and it has passed his testing: https://github.com/Jacob-Christian-Munch-Andersen/Easy-Deflate.

I agree that most of the searching I've done for Javascript compression libraries turns up not just code, but people with problems with the code. An exception seems to be zlib.js.

@fds-github My initial comment unfortunately was mangled by Github, the problematic string contain backslashes (and backslashes to escape them).

Yes, I too see the failure with this test case. Thanks for posting it.

@fds-github Thanks for taking the time and looking at this issue.

I am sorry I overlooked it earlier. But there definitely was an issue where any string that does not cause an exception on JSON.parse (even if they aren't previously stringified JSONs), were being changed by lzwCompress.unpack() - that explains why '""' was returned as "" and '"{\\"abnabnabn\\":\\"asd\\"}"' as '{"abnabnabn":"asd"}'.

This has been fixed now and I have also added two new tests that tests these cases.

Excellent! Thank you for tracking this down and taking care of it.

That takes care of the issue with arbitrary strings, of course I can still engineer a string that results in a compressed object identical to that of an object: '{"__k":["a"],"__v":{"0":"as"}}'.