cvat-ai/cvat

Big chunks can lead to failure in job data access

Opened this issue ยท 8 comments

Actions before raising this issue

  • I searched the existing issues and did not find anything similar.
  • I read/searched the docs

Steps to Reproduce

I created a task with 500 frames in chunk. Totally, there are 750 frames in the task.
I am opening the job and after some time cvat gives the error 500
As a result, users can not use cvat anymore.

I am attaching the video.
https://github.com/user-attachments/assets/338038d9-e2f3-4be5-a322-492368af2214

Expected Behavior

Previously, this scenario worked just fine.

Possible Solution

Maybe this is connected with the new way of chunks storing. #8272

Context

No response

Environment

app.cvat.ai

I created a task with 500 frames in chunk

Please, reduce the chunk size.
CVAT does not create anymore permanent chunks stored on filesystem.

Now it saves them in redis cache. Hovewer the maximum size of cached item is 512 Mb.
In your case, it seems CVAT is not able to prepare and write the chunk during default request timeout (1 minute).

@azhavoro

I would consider restricting chunk size option additionally in your improvements. Very high values may create issues with workers.

@bsekachev , thanks for the response.
I did not look into the source code of the latest releases (yet), do you know maybe if there is an option to enable filesystem back? I have a feeling that storing everything in reids will become a nightmare, when you have hundered users in the same time with all these evictions and restoring...
In some cases, the speed of FS (redis) -> CVAT is not a problem comparing to CVAT -> browser speed and enforcing of Redis usage indeed will be overwhelming and will introduce the maintenance complexity (need to have a large inmem db etc)

Also, when you use larger chunk size, you gain the speed improvement (cvat->browser) due to this and this factor is indeed visible when you leave the default (36 frames) chunk size and when you set it larger then 36 frames.
So maybe it is better not to restrict the chunk size option ๐Ÿ‘€๐Ÿ‘€

do you know maybe if there is an option to enable filesystem back

On self-hosted instance - yes:

MEDIA_CACHE_ALLOW_STATIC_CACHE = to_bool(os.getenv("CVAT_ALLOW_STATIC_CACHE", False))

need to have a large inmem

I am not sure what you mean, Redis not inmem db. It uses filesytem also. And works well with huge amount of users.

The problem here is that chunk is not able to be prepared in 60 seconds, hovewer the next improvement we are going to add is preparing chunks in worker, out of main server processes

@PMazarovich the cause of the issue is that the chunk size (500 frames + image_quaility=100) exceeds the cache limit (most likely 512MB). So try reducing the chunk size, image quality or try using video chunks instead of imge chunks.

I don't think it's a bug - it's impossible to cache data of unlimited size, but we definitely need to return a response with a correct error message at least. In the future we'll need to handle this case and not use cache at all for large chunks, for example.

Related: #7959

@bsekachev , @azhavoro , @zhiltsov-max , thanks for all responses, much appreciated!