splitrb/split

Performance issue when using combined experiments

gnanou opened this issue · 1 comments

In our application, when we started a test with combined experiments, the response time for the feature under test was increased. Split version 3.3.1 was used.
To be more specific:

  • At a run with 3 combined experiments a 100% increase of the total response time was observed.
  • At a run with 6 combined experiments a 900% increase of the total response time was observed.

The investigation showed that some Redis commands that are executed for each combined experiment seem to be redundant. If these could be reduced or avoided, Split performance would improve.

These are some recommendations for improvement:

  • Calling the experiment_configuration_has_changed? method for a second time here could be avoided.
    For both single and combined experiments, this would save 5 calls to Redis for every experiment being started.
  • Cache the has_winner state for the lifetime of an Experiment instance.
    For both single and combined experiments, this would result in only 1 call to Redis to check if a winner exists, no matter how many times has_winner? is called.
  • Clean up user old experiments only once during the lifetime of a User instance.
    When a combined test starts, a new trial is created for each combined experiment and all trials share the same Split user. Then for each trial User#cleanup_old_experiments! is called here. Since user is the same for all trials/combined experiments, it seems that cleanup could be performed just once, in the beginning.
    Currently, for the k-th combined experiment being started User#cleanup_old_experiments! performs 8*(k-1) calls to Redis. This happens because each time a combined experiment begins, its key is added for the user. So, when cleanup is called for the next combined experiment, the user has one extra unfinished key to check if it needs cleanup even though this was added just before. This means that if a combined test has N experiments, the total Redis calls executed due to user cleanup will be 8*(N - 1) for the N-th experiment + 8 *(N - 2) already executed for the (N - 1)-th experiment and so on. These calls would be reduced if cleanup was performed just once.

If you believe the above points are valid and safe to change, I could submit a PR.

That does make sense @gnanou! Making redundant calls on the same request should be avoided as much as possible.

I'm open for PRs optimizing them.

Let me know if you need any help ✨