Cannot replicate experiments in Table 2

Question

Cannot replicate experiments in Table 2

kaiwang19 opened this issue 3 years ago · 4 comments

kaiwang19 commented 3 years ago

Dear Jialing,

I want to replicate the experiments of Table 2 in the paper.

In section 6.1.2, the paper stated that:

For a given dataset, we initialize an index with 100 million keys.

And I need help with the following 3 problems:

Q1: Does Avg depth denote the average depth of [all nodes(model nodes and data nodes)] or simply [data nodes]?
Q2: I assume Table 2 means bulk loading only 100 million keys for 4 datasets, right?
Q3: I tried to replicate the experiments in Table 2, but got different statistics, could you help me check if there is something I miss? (or maybe the later commits have slightly changed some features?)

For Q3, I used the benchmark to test them and get different statistics. I set both init_num_keys and total_num_keys to 100M, so that benchmark will only bulk-load 100M keys. For example,

./build/benchmark
--keys_file=[path to location of a given dataset]
--keys_file_type=binary
--init_num_keys=100000000
--total_num_keys=100000000
--batch_size=100000
--insert_frac=0.5
--lookup_distribution=zipf
--print_batch_stats

For longitudes, I bulk-load 100M keys. The results are as follows:

For longlat, I bulk-load 100M keys. The results are as follows:

For lognormal, I bulk-load 100M keys. The results are as follows:

For YCSB, I bulk-load 100M keys. The results are as follows:

Thank you so much for your time.

Answer 1 · 2021-06-10T21:19:08.000Z

Q1: Does Avg depth denote the average depth of [all nodes(model nodes and data nodes)] or simply [data nodes]?

It denotes the average depth over all keys (so equivalently, the average depth of all data nodes, weighted by how many keys fall in that data node).

Q2: I assume Table 2 means bulk loading only 100 million keys for 4 datasets, right?

Yes.

Q3: I tried to replicate the experiments in Table 2, but got different statistics, could you help me check if there is something I miss? (or maybe the later commits have slightly changed some features?)

There are three reasons you're seeing different numbers. First, the cost model weights we used in the paper (see the last paragraph of page 18 in our arxiv report) are actually slightly different from the default weights in this open-source implementation. Second, the expected_insert_frac was likely set to 0 to produce these numbers. Third, we've made quite a few changes since we initially submitted our paper, so the bulk loading behavior is different. It is probably not possible to exactly reproduce the paper's results by using this open-source implementation.

Answer 2 · 2021-06-11T04:28:21.000Z

Thank you so much for your detailed reply.
I have tried to change the expected_insert_frac, it helps when I change expected_insert_frac from 1 to 0.
For the effects of the cost model weights, I changed them as follows:

// Intra-node cost weights
double kExpSearchIterationsWeight = 10; // 20->10
double kShiftsWeight = 1; // 0.5->1

// TraverseToLeaf cost weights
double kNodeLookupsWeight = 10; // 20->10
double kModelSizeWeight = 1e-6; // 5e-7->1e-6

The results are as follows:

The statistics for longitudes, longlat, lognormal are quite close to the statistics in the paper. The performance of YCSB has the largest difference. Therefore, like the third reason you mentioned, it is probably not possible to exactly reproduce the paper's results by using this open-source implementation.

Could I have another small question about the retraining strategy?
I found that when there is a hyper-parameter(threshold) in the resize method of data nodes. This threshold is set to 50, which will call the retraining of models. Do I need to change this threshold for different datasets? Or is this threshold the best choice given empirical experiments?

Thank you so much for your patience. I am quite interested in ALEX~

Answer 3 · 2021-06-11T12:33:26.000Z

Yes, that's a threshold value that we found works well on all datasets in our empirical experiments, and changing it shouldn't have a big impact on performance. But of course you're free to try modifying it if you want to improve performance a bit further on a particular dataset.

Answer 4 · 2021-06-11T16:32:46.000Z

Thanks so much. Your reply helps a lot.