duplicated entries

Question

duplicated entries

Closed this issue 5 years ago · 6 comments

Hello, I am analyzing this dataset and find some duplicated entries as follows.

r4.large_i-059cc399590672c2b_terasort_hadoop_small_1
r4.large_i-0e6cc86a677b8793f_terasort_hadoop_small_1

What's the meaning of the token i-059cc399590672c2b and i-0e6cc86a677b8793f. Thanks. @oxhead

Answer 1 · 2019-10-25T15:26:01.000Z

It's the instance name. The duplicate entries mean two separate experiments with the same configuration.

Answer 2 · 2019-10-25T16:14:02.000Z

It's the instance name. The duplicate entries mean two separate experiments with the same configuration.

Got it, thx.
But I noticed that not all the configurations are run twice, some of them are only run by one time. Why?

Answer 3 · 2019-10-28T03:00:56.000Z

Those duplicate entries come from duplicate runs of experiment scripts caused by interruption of experiments.

Answer 4 · 2019-11-12T07:40:24.000Z

Those duplicate entries come from duplicate runs of experiment scripts caused by interruption of experiments.

thx~ But some duplicated runs still confuse me. The following two entries are both reports of a spark aggregation program running on the same instance(c3.2xlarge) with almost the same input size, yet their elapsed times are very different(almost 5x). So I wonder except for the timestamp, were their other conditions that are different between these two runs?

Answer 5 · 2019-11-12T18:14:33.000Z

I believe their settings are the same. It can be attributed to performance variance in the cloud or the insufficient resource on c3.2xlarge (probably memory).

This example demonstrates that performance can varied a lot in cloud. In our paper, we point out that this kind of variance posts a challenge to Bayesian Optimization process .

Answer 6 · 2019-11-13T02:23:33.000Z

I believe their settings are the same. It can be attributed to performance variance in the cloud or the insufficient resource on c3.2xlarge (probably memory).

This example demonstrates that performance can varied a lot in cloud. In our paper, we point out that this kind of variance posts a challenge to Bayesian Optimization process .

Got it. thx a lot. 👍