Netflix/aegisthus

Handling very large sstables

bvanberg opened this issue · 5 comments

I am attempting to run aegisthus against a Priam backup containing some very large sstables (~200GB in size). Being that these files are snappy compressed (in addition to the Cassandra snappy compression) and we are using Cassandra 1.2 I'm currently using the code on the coursera fork. I'm running into problems with this as it takes a very long time (3+ hours) to handle these very large files. Aegisthus appears to expect sstables which haven't been double compressed by Priam, and doesn't appear to split the files if the -CompressionInfo.db files are available.

Questions:

  1. Do you guys use Aegisthus with similarly large sstables?
  2. Do you get any splitting on sstables that have the internal Cassandra snappy data? (i.e. -CompressionInfo.db files are available)

Hi @bvanberg, I've been prototyping the code on the Coursera fork =)

Unfortunately, the way Priam currently writes out the compressed SSTable makes it non-splittable, as far as I know (which is why Cassandra needed the CompressionInfo metadata in the first place). Maybe @jasobrown can comment on this.

CompressionInfo.db files should help make the SSTable splittable, but I haven't really looked into it as I'm blocked on the first issue. In theory, you could stage an initial Map phase the decompresses the Priam files into HDFS, and then run the splitter on that.

Btw, if you're reading compressed files you might want to be aware of #10

@bvanberg
We cannot split Priam based files. What my job does (the external part to this job) is to do a distributed copy and uncompress of the data onto hdfs. It isn't ideal, but it has to be done. We do have large files (I believe 100GB is about our max currently). And for the clusters that do have that data we do incremental backups.

Basically we bootstrap by taking the Priam snapshot files on day one. Usually a pretty hefty job. Then from day to day we are only consuming the incremental SSTables as Priam archives them out. For the most part this means that we have much smaller snappy compressed files to deal with, but it does mean that finding and moving all the correct files is much more work.

To answer your second question, we could indeed split the compressed data, but I haven't actually had to consume the data, so the code was only a proof of concept (and apparently buggy).

@charsmith
Thanks for the feedback. I suspected this to be the case. We will have to do something similar to your distcp handling the decompression as our file size is quite large, and I haven't had good success decompressing them inline.

@danchia
We've been looking at the coursera fork as well. Nice work so far. I'd definitely be interested in the progress you guys make. Sounds like we are working towards a common use case that hasn't been fleshed out just yet.

We unfortunately are not running incremental Priam backups as of yet so our jobs are fairly intensive.

This would still be a nice enhancement. I am closing the issue and adding a reference to this in the Enhancements section of the README.