infinitered/thesis-phoenix

Remove or replace lz_string

jamonholmgren opened this issue · 5 comments

@aaronrenner let me know that using lz_string is really slow. We should consider changing it (he had a suggestion -- maybe he can repeat it here, as I don't remember what it was). Or simply remove it. I'm not sure it's all that necessary.

One problem is that when we have production data in the wild, compression helpers can't change very easily. We need to plan the migration carefully so we don't lose data.

I believe we're only using it for backups (right @yulolimum ?)

At my day job, we were having serious cpu and memory usage issues on the machines that were handling lz string decompression. @koudelka (my co-worker and the maintainer of https://github.com/koudelka/elixir-lz-string) recently removed lz_string and replaced it with erlang's built in :zlib library on these servers. As soon we deployed it, we saw both memory and cpu demands on these machines decrease significantly.

I didn't work close enough to the problem be able to share all of the technical reasons of why :zlib is faster, but maybe @koudelka could shed some light on this. When I was looking over thesis, it looked like these backups were being decompressed pretty often and seemed like it could put some unexpected load on your servers.

Hey @aaronrenner - thank you for the input man. I'm thinking to scrap compression altogether unless zlib does a much better job at compression (not only speed-wise, but size too).

Right now, we only decompress when a revision is selected (to restore past content) - so not as often as one may think. However, compression happens every single Publish. Although it may not take a long time to compress in most use cases, I did see that page would halt for a second when I used lots of SVGs on a page as raw_html. So yes, for performance reasons, I think we should probably just save the json string into the DB.

@jamonholmgren - we can probably handle conversions in a migration. Backups feature just got released so the only sites we have that on are a handful of IR clients that probably haven't had many page edits. Also, not sure that we updated Thesis on those sites since release so that's even less.

Hey y'all.

I'm afraid that lz-string really should be replaced. At the time when I was looking around for a client-side compression library, it was the best that I could find. My understanding is that lz-string is really just a partial implementation of the deflate algorithm, but it's not block-based and doesn't do the huffman bit reduction stuff either. It's also fairly unmaintained, as far as I can tell.

The JS zlib library, pako, combined with :zlib has been brilliant, I absolutely recommend it. Smaller compression sizes and much shorter (de)compression times all with drastically reduced cpu/ram usage. And the best part is… I don't have to maintain the BEAM version. ;)

@aaronrenner @koudelka Thanks so much for your input, definitely appreciated!

@yulolimum Let's just remove compression for now. While pako and :zlib sound really great, I don't want to have this same issue down the road if something changes. The amount we're storing is fairly negligible (famous last words...).

However, when we do remove compression, let's make sure to add a migration that uncompresses any existing backups that were compressed. We can recommend that 0.2 users upgrade and migrate before upgrading to the future 1.0 version. The worst that could happen is malformed prior revisions, which isn't that big of an issue to me.