gvegayon/parallel

tmpdir

jibanes opened this issue · 6 comments

Preliminaries

Before submitting an issue, please check (with x in brackets) that you:

  • Are using the newest release (see here for latest release version number).
  • Have checked that the examples in the help work.
  • Have read the help (HTML version) and the gallery of examples.
  • Have checked that there is not already an existing issues for what you are reporting.

Expected behavior and actual behavior

parallel_run changes c(tmpdir) but only mkdir it on the host running the parallel stata command (and therefore calling the parallel_run function, not on the children(s), therefore leading to a scenario where a children might want to use tmpdir, but the directory would not exist.

Steps to reproduce the problem

Run the do file attached after changing the hostnames, it illustrates that c(tmpdir) only exists on the host running the do file, not on the children.

p.txt

System information

  • Stata version and flavor (e.g. v14 MP): 16.1 MP
  • OS type and version (e.g. Windows 10): linux x86_64
  • Parallel version: github latest

We assume here (and elsewhere in the code) that the hosts share a common filesystem, so creating it on the parent host should be sufficient. Is the problem that a child process running on another host might not have received the updated file-system yet? If that's the case, then potentially the clients can wait/sleep, or maybe there a filesystem command to pull the latest change, or maybe recreate the (not sure how NFS deals with multiple hosts making the same folder).

perhaps an argument to parallel could be tmpdir("/path/to/location") ?
Otherwise, is it preferable to redefine c(tmpdir) prior to invoking parallel, or simply create the directory from the children?

Is the problem that a child process running on another host might not have received the updated file-system yet?

On my system /tmp isn't shared; home directories for instance, are. I could potentially have c(tmpdir) there (in my home directory), or do you recommend another approach?

the reason why I was mentioning a tmpdir() optional argument earlier is because this setting ( c(tmpdir) ) isn't mutable from within stata itself, we could set this in our shell startup files, but then it implies that this would be used for all stata sessions, regardless if using parallel or not.

I think the easiest solution is to the start the parent Stata with a temp dir that is shared across hosts.

thanks Brian