Install async update of cache
Closed this issue · 4 comments
This should be relatively easy to do. It would avoid non-consequential changes to untouched-source data. To consider if we should/can keep also the time stamp async. What do you think @edelarua @shajoezhu ?
Hi @Melkiades , I just saw this issue. Can you provide a bit more context for this issue?
Hi @Melkiades , I just saw this issue. Can you provide a bit more context for this issue?
As it is now, every time we cache it, we rerun everything, independently if there are modifications to the source files. I would add a check to see if there is a diff in the source files, then reload the cache selectively. I would use something like git diff --name-only
@Melkiades one thing to note - some datasets affect others (i.e. a change in radsl.R
will cause changes in most/all datasets) which will have to be taken into account when determining which datasets to update.
exactly what Emily said. I think we need to use something to catpure the data set dependencies as well.