ngauthier/hydra

Multiple workers clobbering each other over MySQL

jasonroelofs opened this issue · 25 comments

Trying to setup Hydra to run multiple workers on a quite slow rspec suite, and overall it seems to be doing the job but I'm running into a problem I'm not sure how best to fix:

Mysql::Error: SAVEPOINT active_record_1 does not exist: ROLLBACK TO SAVEPOINT active_record_1

This error shows up every so often (definitely non-deterministic as can be expected). We're using factory_girl for test data and 1.8.7 (both MRI and REE).

This is MySQL telling you that one of the other processes rolled back a different process's transaction.

Try adding this line to your test_helper file:

ActiveRecord::TestCase.use_concurrent_connections

So I gave http://github.com/grosser/parallel_tests a quick run, and they've got documentation on making sure you've got a new database for each runner, and the system itself sets up an environment variable for which number runner is going, which allows proper selection of the database for that worker.

Does Hydra have this? Or is this not supposed to be required with Hydra?

I've never needed a separate database per runner, because factory_girl usually gets around the need for it.

Hydra does not currently support separate databases.

Hey James,

I actually added the "use_concurrent_transactions" to Hydra and pushed version 0.16.3. Give that version of Hydra a shot.

Doesn't seem to have changed anything. I'm using rspec for tests, so I'm not using AR::TestCase and can't seem to find a similar place to put such a statement, if it's still required.

Hey James,

My comment about AR testcase is no longer applicable.

I changed it so that Hydra reconnects to your DB with concurrency enabled, which should affect Rspec, Test::Unit, cucumber, equally. But it sounds like it didn't help you much. (It helped me with some deadlocking I had on another project).

Fwiw, I now also get the SAVEPOINT errors. Problem is that they only occur sometimes :-(

/Jørgen

Can you post a trace of the first couple of lines after the exception?

Here's the full stack trace http://gist.github.com/374018
And what's interesting in my test.log I see stuff like:
Mysql::Error: Deadlock found when trying to get lock; try restarting transaction: INSERT INTO

Fwiw. the specs run without problems when using parallel_spec.

/Jørgen

Seems that DeepTest experienced the same problems.
From http://www.somethingnimble.com/bliki/deep-test-1_2_0
"no more tripping over your own shoelaces
It used to be that DeepTest was young and clumsy when dealing with your database. Using a single database caused occasional deadlocks when multiple tests collided on the same table. DeepTest has grown up and learned some database diplomacy."

It seems to be the same route as parallel_spec has taken with a test database for each worker.

/Jørgen

yeah DeepTest is pretty cool. One of my primary goals with Hydra is minimal configuration, so a multi-db setup is something I don't want to add to Hydra.

Try putting this override in your test_helper AFTER your environment is booted:
http://gist.github.com/374074

I added the gist to the end of my spec/spec_helper.rb file and now it fails with errors like Mysql::Error: SAVEPOINT active_record_9575_1 does not exist: ROLLBACK TO SAVEPOINT active_record_9575_1
So it seems to have been loaded correctly but unfortunately it didn't solve my problem.

Oh well :-)

/Jørgen

wow that's odd. Can you look in log/test.log and find a snippet of the log that has a savepoint creation and then a failure on the rollback?

MySQL doesn't support DDL (table) transactions, but that should break when you run normally.

Hi,
Heres the log from the spec that fails https://gist.github.com/c39890c776e7e4ced34f
I have removed some values and replaced them by "stuff_removed" to protect the innocent :-)

/Jørgen

Btw. DeepTest has the concept of listeners and one of the uses is to setup a database for each worker. If Hydra did this automatically you could still achieve the minimal setup goal you have for Hydra?

Thoughts?

http://github.com/qxjit/deep-test/blob/master/lib/deep_test/database/mysql_setup_listener.rb

/Jørgen

Hey Jørgen,

Looks like the Deadlock is killing it. It's not really the savepoint.

What if you ran two Hydra tasks in a row? One of them manipulates the user table heavily (the tests involving user administration) and the second one for manipulating the rest of the site? You may run into less deadlocks.

You can also try the deadlock-retry plugin:
http://github.com/rails/deadlock_retry

Also, Hydra has listeners:
http://wiki.github.com/ngauthier/hydra/custom-listeners

Right now there is no "Runner Boot" event, which is what you would use to connect to a different DB, but it would be easy to add.

Hey Jørgen,

I started hitting postgres deadlocks in my project, and so I implemented a simple deadlock-retry in hydra. Please pull version 0.16.6 and let me know if it helps you out.

I think you may still get the error because your tests are reporting a rollback error, and not a "ActiveRecord::StatementInvalid : ..... deadlock" type error, which is what Hydra checks for. But let me know.

Thx for releasing a new version. Just pulled it and tried it out - but still no dice :-|

I guess I will just continue to use paralleltest for now until I get some time to hack on the automated creation of a test db per worker.

My spec suite runs in 7m30 so I could use the speedup from distributing things :-)

/Jørgen

Sounds good. Thanks for helping me out with the issue.

If you are able to get your code to raise an ActiveRecord::StatementInvalid for the deadlock, hydra will catch it.

-Nick

Hey Jørgen,

Please try version 0.16.7

-Nick

Hi Nick,

Sounds great. Just tried it out and it seems to run somewhat better :-)

Good stuff: It does not stop with deadlock errors any more.

Could be better: I now get some errors which indicates that the db is not correctly cleared - my guess is that the data inserted in the db with object daddy that should be removed by rolling back the transaction is still there in some cases?

/Jørgen

That is probably the case. In PostgreSQL, a deadlock rolls back the transaction, but it seems that in MySQL it's not doing that.

-Nick

Hi,

Just hacked together a version that supports multiple test databases the parallel-test way (via a TEST_ENV_NUMBER environment variable in database.yml). Check it out here http://gist.github.com/385761

Seems to work perfect for the project I'm working on :-)

/Jørgen

Cool. I can't merge it back in because of the force to db:reset (different people have different setups).

We may want to provide a separate task, like hydra:multidb:prepare that a user can hook into like this:

task 'hydra:multidb:prepare' => 'db:reset'

Feel free to add a note about your fork in the wiki so other people can find it.

Good idea with the multidb:prepare stuff.

For now I have added a note in the wiki http://wiki.github.com/ngauthier/hydra/multiple-test-databases