Shopify/verdict

Verdict is not Thread-Safe ๐Ÿ™…โ€โ™€๏ธ

Closed this issue ยท 4 comments

This issue proposes to make Verdict thread-safe.

Error in Shopify/help repo

๐Ÿ™…โ€NoMethodError in contact#en
undefined method `switch' for nil:NilClass

๐Ÿ›View on Bugsnag

The issue ๐Ÿค•

For multi-threaded applications, it's possible to receive several concurrent requests to Verdict as the app boots for the first time.

The Shopify Help Center receives hundreds of requests per minute, especially to our "Contact Support" page: https://help.shopify.com/questions

Each time the app is deployed, there is a small window of time during the deploy when Verdict has not set up its experiments. The first request to Verdict may come at the same time as several others, since the Contact Support page is always receiving requests. These requests will be processed concurrently in multiple threads.

On one thread ๐Ÿงต

  • Verdict['experiment-handle'] is called
  • this calls the method #repository
  • this checks if the instance of @repository exists, which it does not
  • so Verdict calls the #discovery method and sets the @repository instance variable to an empty hash

Verdict then goes through the process of adding all the experiments to the @repository hash.

On a second thread ๐Ÿงต

  • Verdict[experiment-handle] is called
  • this calls the method #repository
  • this checks if the instance of @repository exists which it does now exist
  • so its skips the #discovery method and returns the @repository which is an empty hash currently

Which it then does not find the existing handle so returns nil

Temporary solution ๐Ÿ’Š

We added a safe navigation operator to #switch that allows the code to continue if the call to Verdict['experiment-handle'] returns nil.

Proposed solutions ๐Ÿ’ƒ

We have three possible solutions:

1) Eager load experiments

Eagerly call the discovery method as soon as Verdict is initialized that way the repository is ready to go.

2) Use Mutexes

Protect the state of the object with Mutexes so the object's state is shared between all threads but the access is limited to a single thread at once

3) Compartmentalize the State of the Instance

Use a thread-safe structure for local state that compartmentalizes the state per thread.

Would appreciate your input and insights @wvanbergen and @eljojo ๐Ÿ˜Š

  1. Eager load experiments

That is the best solution. Lazy loading the experiments in production can be really expensive for some requests. Recently we did this in shopify core to avoid a few seconds on checkout.

https://github.com/Shopify/shopify/pull/192561

Are you willing to work on that solution? If you are, feel free to ping me in the review.

Hi! great write up!
I think 1) eager loading experiments is the fastest way of going about it.

@ptolts recently shipped a change in core so we do that, which turned in 20% of checkouts being 200 ms faster IIRC. https://github.com/Shopify/shopify/pull/192561
https://github.com/Shopify/shopify/blob/46d110a85a1543014a83f881c07912ad60f1dae1/config/application.rb#L338-L339

Ideally we'd do 3), at some point I wrote a document proposing how we could do this, but I think it would be a tremendous amount of work.

There's some good bits of context on https://github.com/Shopify/shopify/pull/110923 about why verdict's loading behaviour is so painful.

I've seen the issue before and also ended up doing a nil check ๐Ÿ˜ญ.

Core isn't threaded and had the same issue -- there must be something else besides threading that triggers this. I haven't seen the issue in a while (Bugsnag search) โ€“ maybe eager loading from Shopify/shopify#192561 did indeed solve the problem on core.

resolved on #44