ajvb/kala

JobStats cleanup

QBui opened this issue · 5 comments

QBui commented

I have a job that scheduled to run every 5 mins. After few weeks of job executions, the job stats get accumulated quite a few of entries, it caused the API api/v1/job/{id} to return lot of data. I am looking for ways to clean up these stats since I am only interested in checking for the last running job status.
I am thinking about changing kala code in db.go to add few more methods for cleaning up the old stats entries (e.g. removing items greater than 1 weeks old). I need some pointers to see if this is the right thing to do. I can contribute the code back once I am done.
Please let me know. Thanks,
-Quan

QBui commented

Approach #2: Removing stats items from the Cache and rely on JobCache.Persist() method to update the underlined database. This probably a quicker implementation. I will try out this approach.

ajvb commented

Oh this is interesting. I think I would lean towards one of two solutions (or both of them):

  1. The ability to set a TTL for job stats, where after a certain amount of time specified by the user they are deleted. The default would be never for backwards compatibility.
  2. The ability to filter with the get request to only get N number of stats or stats in between a datetime range.

@QBui What do you think about these?

QBui commented

Thanks for the reply. I will start with the #1. It will address my issue and I also don't want to keep the old job stats around. IMO, the #2 is nice to have. It may require to implement additional REST resource, e.g. /api/v1/job/{id}/stats?since=. I am not sure about this use-case yet. I will revisit this one later.

ajvb commented

@QBui Sounds great.

gwoo commented

Fixed in #176