xysun/blog

Paper reading Jan 2019 [3]

Opened this issue · 0 comments

xysun commented
  • Serverless computing, one step forward, two steps backward?
    • And we are back to CS papers! \o/
    • Current serverless benefits:
      • good for "embarrassingly parallel functions"; eg. image resizing, object recognition (with trained weights)
      • autoscaling (unlimited computing power), pay-as-you-go
    • Downsides:
      • not good for data intensive computing (because "data-to-code" instead of "code-to-data"), eg batch training.
      • No efficient message passing between functions, rely on slow storage/queues.
      • Limited lifetime + hardware
    • Improvements:
      • Different hardware offerings
      • code to data approach
      • Long living Actor-as-a-service? One can have dreams =p
  • Learned index structures
    • Argues that we can replace B-tree, hashmap, and bloom filters by "learned index". Core insight is we should take advantage of data distribution.
    • Main challenges are 1) how to ensure error boundaries 2) computation cost.
    • B-tree and hashmap can be seen a CDF approximation problem, position = F(key) * M, where M is total observed population. For hashmap, this helps building more "filled" maps so less collisions.
    • Bloom filter is a binary classifier problem. Note in experiments we assume we have negative training data from historical queries.
    • "Recursive model index": a hierarchy of models, each upstream model's prediction is used to pick which model to use in the layer below. This essentially shrinks the data into subsets for models at lower layers. This is similar to greedy pretraining. This is also inspired by how B-tree traversal works: each node has a 1/page_size precision gain.
    • image
    • Experiment results show faster lookup and less memory usage. When key is string the performance is at par with traditional indices though, although the author only used a naive feature vector for strings.
    • There are 2 interesting blog posts pointing out that the learned index may not be that good yet: here which I put my notes below and here which argues one can achieve similar performance by using splined interpolation to approximate the CDF (basically using B-tree's intermediate nodes as spline knots and interpolate the leaves).
    • I really like & am very excited about the "ML for systems" idea, where we use learned methods to improve the system stack, and I think this is the right trend. For me the value of this paper is more about presenting the interesting problem to solve, rather than trying to beat traditional indices.
    • Open questions:
      • the author had to do some heavy c++ codegen & optimisation to make the learned structure as performant as tuned indices. This is hard to generalise & popularise the idea.
      • current method still has no error boundary at inference time, while B-tree has nice semantic guarantees.
      • how about write-heavy workloads?
      • multi dimensional index keys
      • better performance for string keys
      • can we use other techniques, eg. more complicated network, RL (RL has been applied to optimise join in bigdata queries)?
  • Don't throw out your algorithm book just yet
    • A direct response for the learned index paper, showing that using bucketised cuckoo hashing can achieve better/par performance with less memory as well.
    • Agrees that learn data distribution is a good idea: allow self tuning databases; model training makes sense when compute is (going to be) cheaper than memory access.
    • cuckoo hashing
      • Vanilla version: 2 hash functions + a map with load factor < 50%
      • During insert:
        • Compute 2 locations from the hash functions
        • If 1 is empty, put it there; if both are full, eject the previous key occupying the slot, and attempt to find a new place for the ejected key follow same logic
        • If impossible, rebuild the table with new hash function
        • Works all the time without table rebuilding because of "power of 2" trick
        • Variations: bucketised meaning each slot can hold a bucket of n keys instead of only one. In their code implementation, they use a bucket size of 8 that can achieve load factor of 99%, hence saving more memory.
    • Thoughts / potential experiments:
      • It's easy to combine cuckoo hashing + learned model approach and would be interesting to see if that gives even better performance.
      • We'll need a higher level and easier-to-compute benchmark that is almost language/hardware independent. What I can think of: load factor; how many lookups attempts for each successful lookup
      • Also interesting to see if their hyper parameters (bucket size etc) affect the performance.
      • This reminds me of how deep learning started slow but surpassed expert systems by a large margin eventually. =]