/paper_readings

Keep track of the papers I have read and to be read

MIT LicenseMIT

paper_readings

Keep track of the papers I have read and to be read

Three pass reading : How to read a paper

More detailed methods : skills about reading papers

kv-storage

  1. PebblesDB: Building Key-Value Stores using Fragmented Log-Structured Merge Trees

    Stage : 1/3

    Description : Introduce guard structure to reduce write amplification.

  2. SILK: Preventing Latency Spikes in Log-Structured Merge Key-Value Stores.

    Stage : 1/3

    Description : Introduce a new I/O Scheduler to reduce write tail latency.

    • Preempting Compactions
    • Manage bandwidth
      • Low load --> give compaction more bandwidth.
      • High load --> give compaction less bandwidth.
    • Give flush and low-level compaction(level0 --> level1) higher priority.
  3. WiscKey: Separating Keys from Values in SSD-Conscious Storage

    Stage : 1/3

    Description : K-V separation in LSM-tree to reduce amplification .

    else : An article from cxs introduces this tech used in real products

  4. Leaper: A Learned Prefetcher for Cache Invalidation in LSM-tree based Storage Engines

    Stage : 1/3

    Description : Introduce offline learned modle and online inferrence to get the hot/cold range then get the related blocks.

  5. Revisiting Data Prefetching for Database Systems with Machine Learning Techniques

    Stage : 1/3

    Description : Devise a Multi-Model framework depends neural network to optimize random accesses by transactions.

  6. MyRocks: LSM-Tree Database Storage Engine Serving Facebook's Social Graph

    Stage : 0/3

    Description : TODO.

    else : A good article explains this

  7. LSM-based Storage Techniques: A Survey

    Stage : 1/3

    Description : A survey about LSM-tree.

    • History of LSM-tree and basic implementation.
    • Propose a taxonomy.
    • Some representive LSM-based NoSql systems.
    • Refects and identify several outages and oppotunities.

HTAP DB

  1. HTAP Databases: What is New and What is Next

    Stage : 1/3

    Description : Introduce the current HTAP DB and its techniques.Also introduce a general benchmark to measure the performance.At last, talk about the problems and opportunities.

    else : A good talk by the author. An article that summarizes this

Cloud-Native DB

General:

A good course

Papers:

  1. CloudJump: Optimizing Cloud Databases for Cloud Storages

    Stage : 0/3

    Description : Introduce the challenges when traditinal DB storage switchs to cloud storage. Then propose a frame named CloudJump and apply it to polarDB(B+ tree) and RocksDB(LSM-tree) to show the performance promotion.

    else : A good article by the author. An article about cloud storage

File System

  1. DESIGN AND IMPLEMENTATION OF THE SUN NETWORK FILESYSTEM

    Stage : 1/3

    Description : v2/v3 adopt stateless which means it's up to client to save the state of each operation(file handler), the drawback of this way lays on that when one client remove a file then the server have no idea whether there are other clients using the removed file , so when the clients which hold the fh of the removed file want to operate on the file, there will be an error, the solution is to add a variabe called generation to record the version, when operate on a out-of-date file ,there will be a warning which will not cause a security problem. v4 adopt stateful method which is not mentioned in this paper.

    • client : In the OS kernel, there is a client which is responsible for the RPC , and in v2/v3 it will get file handler(fh) to record the state.

    • server: add VFS and Vnode interface, more general.

    • else :

      • need to write data to disk while in local env we only need to write cache.(write-through)
  2. GFS

    Stage : 1/3

    Description : large files ,read more ,write less ,modify less ,append more. Generally, a file has three replics distributed in different chunkservers(for hotter file , the replicas will be more)

    • master : store the MetaData. The chunk handles a file contains and so on...
    • chunkserver : constains a lots of chunks , each chunk has a chunk handle(uint64_t).
    • read : the client firstly read from the master to get the metaData. Then go the closest chunserver to get the data, if this one is not working then go to the other two.
    • write : relax consistency model, only one of the replicas(primary) has the right to write(lease from the master). two phases. Data flow and control flow are seperated.
      • first : data flow , no order between chunkservers.
      • second : control flow , ordered. primary to other chunkservers
    Data flow :
    client ---> chunkserver1 ---> chunkserver2 ---> chunkserver3 ....
    
    Control flow : 
    client ---> primary ---> other chunkservers ---> when all finish primary return finish to client
    
    

    else : HDFS is the open-source version of GFS.

Useful talks

  1. Cloud Data Warehousing: Snowflake and Beyond

    Description : Talk about the details about Snowflake and introduce some exciting ideas about cloud DB research.