open-cogsci/omm-server

OMM can not be accessed anymore

Closed this issue · 34 comments

Hi,
A few days ago I tried downloading the data from an experiment and since then it has been impossible to start OMM again. Looking around I found that the log of mysql DB keeps adding these lines:

2023-06-07 07:33:54 mysql | 2023-06-07T05:33:54.604105Z 0 [Warning] [MY-012638] [InnoDB] Retry attempts for writing partial data failed.
2023-06-07 07:33:54 mysql | 2023-06-07T05:33:54.604203Z 0 [ERROR] [MY-012888] [InnoDB] Cannot resize redo log file ./#innodb_redo/#ib_redo26854_tmp to 3 MB (Failed to set size)
2023-06-07 07:33:54 mysql | 2023-06-07T05:33:54.614657Z 0 [Warning] [MY-012638] [InnoDB] Retry attempts for writing partial data failed.
2023-06-07 07:33:54 mysql | 2023-06-07T05:33:54.614684Z 0 [ERROR] [MY-012888] [InnoDB] Cannot resize redo log file ./#innodb_redo/#ib_redo26854_tmp to 3 MB (Failed to set size)

This seems to be just a surface problem, when I tried to download the data I was also looking at the Vmmem and it went up to something very large, so I think we're not out of the woods with the working memory problem...

Hi Daniel, I explored the issue with my brother. It turns out every roughly hour a new redo file is created in /var/lib/mysql/ with a size of 1GB so mysql ended up being 250GB.

image

Here the beginning of the files:
image

As you can see the files started being created on the 6/5/2023, when we last 'solved' the previous problem. I removed all the redo files, the Vmmem went from 16GB to 8GB and the OMM server started again...

What do you think? Is 1GB redo files every hour normal? Can we limit their number/size? This doesn't seem to be linked to data downloading... but has happened since we last updated OMM server?

Thanks,
nico

Now the DB is back to it's normal 1GB size

Hi Daniel, enjoy your holidays! Just to reiterate, I think this is different from #150 (unable to retrieve data from experiment), the redo log files seem to appear only when the experiment is running.

My understanding is that these redo files are a safety mechanism to prevent data corruption when the system crashes. Based on the error message, it looks as if the resizing these files goes wrong, which might be because the system is running out of disk space. Could that be it?

Yes, but backward, the system is running out of disk space because it's creating large recovery files when an experiment is running. So the question is why is it creating large recovery files when all goes well? or is that just normal?

From what I understand, these redo files are also created during normal operation as a precautionary measure just in case, so that the system can recover from a crash. What you're describing seems excessive though. Are the redo files only created when an experiment is running and then cleared up afterwards?

They are only created when an experiment is running but NOT cleared afterwards. It does seem excessive it's 1GB (the size of the DB) every 90 mins.

And if you look at the MySQL database, is this accompanied by any crashes + restarts? And is there noticeable data loss?

There is no evidence of data loss and there are no crashes/restarts (otherwise we would see the experiment going sideways). Just to emphasize, this happens even when we do not try to download the data.

My best guess is that this behavior is due to incorrect configuration of MySQL. One likely contender is this line in docker-compose.yml, which was indeed added as part of the last fixes.

      - --innodb_buffer_pool_size=1G

However, I don't know enough about this to say much more. @dschreij can you look into this?

This continues to be a major problem. I thought it might be resolved by changing the server but we now have a very strong server and it still happens. I think this is associated with this message on the mysql log file:
2024-02-05 09:41:11 2024-02-05T08:41:11.016007Z 0 [Warning] [MY-013865] [InnoDB] Redo log writer is waiting for a new redo log file. Consider increasing innodb_redo_log_capacity.
2024-02-05 09:41:59 2024-02-05T08:41:59.663143Z 0 [Warning] [MY-013865] [InnoDB] Redo log writer is waiting for a new redo log file. Consider increasing innodb_redo_log_capacity.
2024-02-05 09:42:24 2024-02-05T08:42:24.799772Z 0 [Warning] [MY-013865] [InnoDB] Redo log writer is waiting for a new redo log file. Consider increasing innodb_redo_log_capacity.
2024-02-05 09:43:01 2024-02-05T08:43:01.643412Z 0 [Warning] [MY-013865] [InnoDB] Redo log writer is waiting for a new redo log file. Consider increasing innodb_redo_log_capacity.

We get error messages on the client saying python is not able to connect, they appear and then disappear, suggesting the server is saturated.

I'm not sure why this issue occurs, because I don't think that the server is dealing with a volume of data that is unusual for MySQL. But let's just do what the log file suggests, namely increasing the innodb_redo_log_capacity. See also:

To do so, you can edit docker-compose.yml and add this command line argument:

    ...
    command:
      - --sort_buffer_size=5M
      - --innodb_buffer_pool_size=1G
      - --innodb_redo_log_capacity=1G
    ...

Then restart the container and see if the issue goes away.

Thanks, that's what I ended up doing yesterday, I used the following to increase the size from 100M to 1GB in the docker compose:
--innodb_redo_log_capacity=1073741824

This provided useful information:
https://www.percona.com/blog/how-to-calculate-a-good-mysql-redo-log-size-in-mysql-8/#](https://www.percona.com/blog/how-to-calculate-a-good-mysql-redo-log-size-in-mysql-8/
https://dev.mysql.com/doc/refman/8.0/en/innodb-redo-log.html

The change was effective, you can check by going to 'cd /var/lib/mysql/#innodb_redo' and use
`ls -l --block-size=M'. This provides the value of the memory with a list of the redo files, with the ones with '_tmp' being on the waiting list. From what I understand the files are purged progressively and the 'waiting for new redo log file' means that all the files are gone and are being regenerated.

Increasing the log capacity will certainly help, but my feeling is that these problems appear when an experiment last for some time, suggesting that this might be linked to the amount of data in the experiment and maybe the number of times each job is done.

Now that I 'understand' what's going on I will try to test it.

That's good to hear.

Increasing the log capacity will certainly help, but my feeling is that these problems appear when an experiment last for some time, suggesting that this might be linked to the amount of data in the experiment and maybe the number of times each job is done.

Yes I agree. There's likely something about the way that OMM writes data to the database that triggers this issue. Not simply the overall volume, probably, because MySQL is designed to work with way larger volumes of data. But perhaps something about the specific queries that are used. It may be tricky to find out what exactly, but in a sense as long as the symptoms can be treated by increasing the redo-log capacity, then that's also ok.

Ok, so I did some digging and got confirmation of what I thought. For a stupid reason the experiment we were running used the same job_id all the time. All the problems appeared when we used about 5000 times the job, with the server getting completely saturated, creating 1GB binlog files every 30 secs or so.

So I agree, I think this makes it clear that a lot of the problems with the server come from the way the data is written and access.

Here is my best guess... data is stored in job_states as a single JSON object and mysql doesn't deal with JSON objects well, so I think every time you want to write something, you need to unpack, add, repack using the Group_by sort_by commands that saturate the buffer.

So I think packing the data of all instances of a single job as a JSON object is a mistake, each instance should have it's own id (every experimental trial should have a retrievable id).

I added the following line to the docker-compose to remove automatically the binlog files after 24h:
--binlog_expire_logs_seconds=86400

This works.

Here is my best guess... data is stored in job_states as a single JSON object and mysql doesn't deal with JSON objects well, so I think every time you want to write something, you need to unpack, add, repack using the Group_by sort_by commands that saturate the buffer.

I think you are right here. Doing so is suboptimal and should be done better. I never imagined back when this was designed that 5000 trials would be stored in one cell. I will see if I can improve this somewhere in the near future. Doesn't appear to difficult. The most difficult part is how to convert data already available into the new format.

Hi Daniel,
Thanks, that would be great, we're also trying to think of a solution on our end, so it would be great if we could coordinate.
Regarding conversion of data already available into the new format, this doesn't worry me because we don't need to do it, I can export and save the data and we can start anew with a fresh and clean DB.

Now that we know the culprit (which is great): we have already encountered this issue before and decided not to fix it because it will be very difficult to fix and very easy to work around by making sure that experiments do not repeat jobs too often. See also #92

I don't think there is a solution apart from really dealing with the way the data is stored because it is causing many problems on the server, the sort memory, the creation of redo and binlog files, the problematic retrieval of data are just the most obvious. There are also more practical issues, we can't track how many trials have been done in an experiment, we can't retrieve data based on simple parameters like date and time, participant, etc. And I don't see how we can do long term experiments (multi-year ones) with this.
So I do think we need to rethink the data storage. I don't know how to do this but what I would like to do is store the data that is retrieved from the client separately from the jobs. Every time the client sends a result of a trial, this is added to the data table (?) and can not be removed (e.g. when you remove a participant or an experiment). Data on an experiment can then be retrieved by asking directly the data for specific time, experiment, individual, etc (just a few fields, the rest in JSON).
I don't know what it would take to do this but I think it's necessary.

Hi @nclaidiere , sorry but I will be firm on this: I'm not going to ask @dschreij to change this at this point, even though I do agree it's not an ideal implementation (but also not a bug as such), because it will require a fundamental rewrite of the code, while simple workarounds are very achievable—and, importantly, they will absolutely solve the issues that you're facing now!

So let's shift the discussion towards how you can implement your experiments so that they won't trigger this issue. Why don't you post one experiment that causes the issue on the Freshdesk issue tracker, and then I'll take a look at it and offer some suggestions?

Hi @smathot, I am not asking @dschreij to rewrite the code but I would value his opinion on how to proceed. I don't think we will get something nice and shareable unless we fix this.

Ok, @dschreij what are your thoughts on this? Specifically, how much work do you estimate it will be to refactor the database such that there won't be a single enormous JSON blob anymore?

I don't think we will get something nice and shareable unless we fix this.

That will be ok, really. The issue feels really pressing now because it causes the server to crash. But with clear and easy-to-follow guidelines about how to avoid jobs from being repeated too often this won't happen anymore, and this will just be a limitation that does not have many practical disadvantages (that I can see now).

I need to do some investigation first, but I think it's not that much work. The way I remember to have built the relation between a job run and job variable should allow me to attach multiple values to this relation quite easily (instead of only one value for a job run and specific variable as is the case now). I will need to dive into my code again to be certain though. I don't know yet when I will be able to do this though, because it's crazily busy lately, but I will keep this on my radar and let you know once I've had a look (and if it's not much work, potentially already fixed it).

Thanks Daniel, that's very kind of you. I would like to try detach the data from the job completely, it would be great if every time a job is done, the JSON data is stored in a separate place, at the beginning I was thinking a different database entirely, but I am not sure this is necessary. The reason is we never use the job data during an experiment, we use participant_data, etc to save an retrieve what we need. We just need a few fields to interrogate the stored data (participant_name, datetime, experiment name or number...) and that's it. Thanks for your help.

To make sure that we're all on the same page here:

@dschreij Nicolas is running OMM in production and needs this issue fixed (or worked around), which means to things:

  1. @nclaidiere may expect to be able to upgrade without losing the data currently in the database. If your fix requires the database to be cleared because of structural changes, communicate this clearly and in advance.
  2. If you intend to fix/ improve this (which would be awesome!), we need a clear commitment that you will actually do this and within a reasonable time frame. If you feel that you cannot commit to that, please let us know.

@nclaidiere The participant and job data are largely the same thing, and both should remain in the database. The issue is the following (@dschreij can you confirm that what I'm saying is correct?):

Right now, job results (responses, experimental variables, etc.) are saved as a JSON string in a record that corresponds to a single job for a single participant. When the participant repeats a job, the new job results are added to this same JSON string, which therefore becomes a little bit larger each time that a job is repeated. In itself, this is reasonable, and a job can be repeated very often before the JSON string becomes prohibitively large. This means that scenarios like repeated practice trials (which we had in mind during the design) etc are no issue.

However, for 'infinite experiments' that are programmed such that a limited number of jobs are repeated very, very often, at some point the JSON string becomes so large that we're starting to see issues such as this one and the out-of-sort memory (#92) that we saw before.

Remember that we diagnosed this issue before (as #92) and then decided to work around it by implementing infinite experiments differently (i.e. by using large job tables so that repetitions are spread out over jobs)! I am 99% sure that this is the same issue, which has now resurfaced because over time the workaround was forgotten and then a new infinite experiment was implemented, again with a limited number of jobs that are repeated very often. @nclaidiere Can you confirm this?

Because of this consideration, I still feel that working around this issue is also a pragmatic and satisfactory solution!

To answer your message above @smathot :

To make sure that we're all on the same page here:

@dschreij Nicolas is running OMM in production and needs this issue fixed (or worked around), which means to things:

1. @nclaidiere  may expect to be able to upgrade without losing the data currently in the database. If your fix requires the database to be cleared because of structural changes, communicate this clearly and in advance.

This is not a problem, I am happy to restart from scratch, 'loose all data', if the issue is fixed.

2. If you intend to fix/ improve this (which would be awesome!), we need a clear commitment that you will actually do this and within a reasonable time frame. If you feel that you cannot commit to that, please let us know.

@nclaidiere The participant and job data are largely the same thing, and both should remain in the database. The issue is the following (@dschreij can you confirm that what I'm saying is correct?):

Right now, job results (responses, experimental variables, etc.) are saved as a JSON string in a record that corresponds to a single job for a single participant. When the participant repeats a job, the new job results are added to this same JSON string, which therefore becomes a little bit larger each time that a job is repeated. In itself, this is reasonable, and a job can be repeated very often before the JSON string becomes prohibitively large. This means that scenarios like repeated practice trials (which we had in mind during the design) etc are no issue.

I disagree, "a job can be repeated very often before the JSON string becomes prohibitively large", is not right, just a 1000 times before we start having BIG problems, this occurs if we do experiments for a month or so, it's a regular problem that I am just not mentionning every time it happens because it is documented and I am working around it.

However, for 'infinite experiments' that are programmed such that a limited number of jobs are repeated very, very often, at some point the JSON string becomes so large that we're starting to see issues such as this one and the out-of-sort memory (#92) that we saw before.

I agree that based on what Daniel said the 'out of sort memory' is a symptom of the same underlying problem.

Remember that we diagnosed this issue before (as #92) and then decided to work around it by implementing infinite experiments differently (i.e. by using large job tables so that repetitions are spread out over jobs)! I am 99% sure that this is the same issue, which has now resurfaced because over time the workaround was forgotten and then a new infinite experiment was implemented, again with a limited number of jobs that are repeated very often. @nclaidiere Can you confirm this?

I don't think we can really use large job tables, it already takes a long time to have a few 100 lines job table uploaded (#115), what about 100 000 lines?

Because of this consideration, I still feel that working around this issue is also a pragmatic and satisfactory solution!

I disagree, it's not a satisfactory solution, it is a temporary workaround while we are waiting for the issue to be fixed.

Other reasons why this is important is that:
1 - the statistics do not currently represent anything, because we can not know the number of trials done.
2 - we can not retrieve data in a convenient way, e.g. by participant, over a certain period, etc.
3 - the data disappears when we do anything with jobs, remove a participant, change the joblist, etc. Whereas ideally, data would be completely unaffected by this. Unless, maybe, the experiment itself is removed.

I don't think we can really use large job tables, it already takes a long time to have a few 100 lines job table uploaded (#115), what about 100 000 lines?

Right, I forgot about this. I agree that uploading large job tables is only an option when you can actually do this. If we tackle this performance issue instead, would you then find this a suitable workaround?

I don't think we need another workaround, I would rather solve the issue for all the reasons mentionned above.

If you intend to fix/ improve this (which would be awesome!), we need a clear commitment that you will actually do this and within a reasonable time frame. If you feel that you cannot commit to that, please let us know.

The question here is what is reasonable :) I do not have as much spare time anymore as I used to have (or at least that I also can or want to spend behind a computer), but I can do my best to find a moment in the next month or two.

I don't think we can really use large job tables, it already takes a long time to have a few 100 lines job table uploaded (#115), what about 100 000 lines?

Right. This one slipped my mind. We faced the same issue as well a while ago, and it has something to do with running MySQL in docker that makes it slow. Can you try the solution that is proposed in this thread: https://forums.docker.com/t/mysql-slow-performance-in-docker/37179/26, i.e. add --skip-name-resolve to the mysql options in docker-compose.yml and try to see if it goes faster after restarting the container?

Another option might be add :delegated to the volume mappings of mysql, see https://stackoverflow.com/a/57767710

The final solution, which involves the most work, would be to run MySQL natively on Windows on the server machine on which performance is critical. This is of course the least portable solution, and you will lose a lot of Docker's features that make running services like MySQL easier, but you will gain the performance you need.

Closing this in favor of #155.