fluidex/dingir-exchange

corrupted DB when migrations fail (starting persistor before matchengine)

Closed this issue · 3 comments

The error occures in an orchestrated environment, i guess when the orchestrator decides to restart pods in a certain order. i think if the persistor is started before the matchengine the error occures and the migrations fail. also the database seems to be corrupt from this point forward, since the only solution is to reset the DB.

[0m fluidex_common::non_blocking_tracing: thread 'main' panicked at 'Init state error: migration 20200123090258 was previously applied but has been modified', src/bin/matchengine.rs:30:38

I think it would be better if we separate this issue into 2 sub-problems.

  1. fix the migration problem (#357 ).
  2. fix the "needing to reset the db" problem.
  3. will be better if restapi retries until connecting to matchengine (#358)

context

#244 (comment)

Hi @lackrobin @gcomte

So can we close this issue now?

We tested it in a cluster and it seems to be fixed 👍