Using IOTracker to empirically test what real programs do.
Closed this issue · 0 comments
gatoWololo commented
The design of ProcessCache must be informed by what programs are actually doing. Currently there are many unknowns or speculations on our part about how real programs execute. Below I have started a non-exhaustive list of some questions about how programs execute. Notice by "programs", I mean workflows we hope to target with ProcessCache.
- Question: How many execs are done by `different workloads?
Reasoning: This will determine the granularity of our caching and the amount of cacheable-units we can hope to have. This will also inform us of what are good domains for ProcessCache, i.e. I expect bash scripts to have a buncha exec calls, while a single python script not so much. - Question: How many getdents are done by processes?
Reasoning: See #23, we don't have a clear way of handling directory reads. Knowing how many programs are actually doing this will inform us whether the proposed solution on #23 would even work. - Question: How often are outputs of a sub-exec computation reused as inputs the parent computation.
Reasoning: I have been thinking about what exec-units must be reexecuted when an input has changed. As we discussed during our 1/28 meeting, should we reexcute the parent if a child computation needs reexecuting? (I have a bunch of preliminary thoughts on this, but it is too early to write them down). But knowing this will greatly inform the design of our dependency graph and whether parent processes should be reexecuted... - Question: How often are the "at" variants of system calls called (such as
openat
instead ofopen
)? And when they are used, how often is thedirfd
actually used, instead of justAT_FDCWD
?
Reasoning: This will help us understand how common it is for programs to touch the file system outside thecwd
they start in. - Question: How often are different flags called in different syscalls?
Reasoning: Many syscalls, such asopenat
andexecveat
, have flags, and we want to know how often they are called so we know what common cases we are facing. Plus, there are so many flags, so knowing which to prioritize first would be helpful.
Feel free to add more!