/MaxEmbed

Primary LanguageC++

MaxEmbed

MaxEmbed is an implementation of the paper "MaxEmbed: Maximizing SSD bandwidth utilization for huge embedding models serving". This documentation explains how to build, configure and use the system.

For AE reviewers

You can use the run_all.sh script located in the ae_scripts folder to execute the evaluation procedure. This script will perform the online procedure of MaxEmbed and generate the figures presented in the paper.

The run_all.sh script accepts two single parameters, the first one specifying the log folder, which is used to save the log files, and the second one specifying which experiment to run.

For the first parameter:

  • If you specify an existing directory, the script will use the log files in that directory to draw figures.
  • If you specify a non-existing directory, the script will run the experiments, save the log files in the specified directory, and then use those logs to draw figures.

Figures will be saved in the <log_dir>/figures folder.

The second parameter can be one of the following:

  • 1 - Run exp1.sh and draw figure 8, 10, 11, 12
  • 2 - Run exp2.sh and draw figure 13
  • 3 - Run exp3.sh and draw figure 14
  • 4 - Run exp4.sh and draw figure 9 If you do not specify the second parameter, the script will run all the experiments.

for example:

bash
cd ae_scripts
bash ./run_all.sh log 1
# This will run exp 1 and draw figure 8, 10, 11, 12. 
# figures will be saved in the log/figures folder.

We have provided a log in the ae_scripts/log folder, which you can use to draw the figures.

How to Build

Prerequisites:

Run the following commands:

bash
cmake --preset=default
cd build
make -j

Note: Update SPDK library path in CMakeLists.txt before building.

How to Use

Step 1: Partition Procedure

The partition program is located at build/partition/partition_step.

bash
./build/partition/partition_step <cnt_per_part> <input_file> <output_file> <binary_graph>
  • cnt_per_part: Number of embeddings per page
  • input_file: Hypergraph file containing embedding query logs
  • output_file: Output partition file
  • binary_graph: Input format flag (binary/text)

Input File Format

Plain text format:

The first line contains 4 integers: the number of vertices (C), the number of queries (N), the sum of the length of each query (Q), and an unused integer (U). The following lines contain the vertices of each query.

<U> <C> <N> <Q>
<vertex> <vertex> <vertex> ... <vertex>
<vertex> <vertex> <vertex> ... <vertex>
...

Binary input file format:

The binary input file contains 4 parts:

  • Part 1: The first 8 bits are the number of queries (N).
  • Part 2: The next 8 bits are the number of vertices (C).
  • Part 3: The next 8 * (N + 1) bits are used to index the vertices of each query. For example, query i's index is the i-th 8 bits and the (i+1)-th 8 bits, which means the data of query i lies in the range of [index[i], index[i+1]).
  • Part 4: The next 4 * Q bits are compressed data of the vertices of each query, indexed by part 3.
<8 bits N(num of queries)><8 bits C(num of vertices)><8 bits * (N + 1)><4 bits * Q>

For example:

./readme_fig/example1.png

Output File Format

The output of this step is a placement file, indicating which partition each vertex lies in. Because after the subsequent replication procedure, a vertex may be in multiple parts, to unify the format, we use the following method to represent a placement.

The output file contains 5 parts:

  • Part 1: The first 8 bits are the number of vertices (C).
  • Part 2: The next 8 bits are the sum of parts the embedding lies in (An embedding may lie in multiple parts).
  • Part 3: The next 4 * C bits are the index of each vertex to the partition.
  • Part 4: The next 4 * C bits are the count of each vertex to the partition.
  • Part 5: Compressed data indexed by part 3 & 4.
<8 bits C> <8 bits Q> <4 * C bits index> <4 * C bits count> <4 * N bits>
...

If the partition step is slow

swapoff -a # disable swap
echo always > /sys/kernel/mm/transparent_hugepage/enabled # enable THP
# but you need to disable THP when using SPDK

Step 2: Replication Procedure

The replication program is located at build/partition/replication_step.

./build/partition/replication_step <mapping_file> <input_file> <rep_ratio> <cnt_per_part> <output>
  • mapping_file: The partition file generated by the partition procedure
  • input_file: The same input file used in the partition procedure
  • rep_ratio: The replication ratio
  • cnt_per_part: The number of vertices in each partition
  • output: The output replicated partition file

Output File Format

The output file format is the same as the partition file format.

Format Conversion Tool

We provide a tool to convert the input file to binary format.

./build/partition/trans_query <input_file> <output_file>

This tool will convert the input_file to binary format and save it to the output_file.

Replication Shrinking Tool

We provide a tool to generate an embedding placement with a low replication ratio from a placement with a high replication ratio. For example, you can generate a placement with a replication ratio of 0.2 from a placement with a replication ratio of 0.8.

./build/partition/shrink_replication <input_file> <output_file> <rep_ratio>

Online Procedure

The online phase program is located in build/client/client.

./build/client/client <query_file> <mapping_file> -d <embed_dim> -n <thread_number> -b <batch_size> --delay <delay> -t <time> -s <ssd_num> -c <ratio>
  • query_file: The input query file (binary format)
  • mapping_file: The input mapping file (binary format)
  • embed_dim: The dimension of the embedding
  • thread_number: The number of threads
  • batch_size: The batch size
  • delay: The delay of the inference (used to simulate the end-to-end situation)
  • time: The time of the online procedure; if 0, the program will run the whole query file
  • ssd_num: The number of SSDs
  • ratio: The cache ratio (of the whole embedding table)