gvegayon/parallel

SJ Review Todo

gvegayon opened this issue · 1 comments

  1. About naming conventions and explaining the process
  2. About memory differences
  3. Child processes, details about memory usage.
  4. Stata/MP citation, and R and Matlab citations
  5. Removing paragraph.
  6. Diagnostics tools commands

We added more info on this.

  1. About renaming command for stating parallel and setting clusters
  2. About setting a default number for number of processors
  3. About slowdowns.

Provide a description of slowdown.

  1. Why providing an option to set the stata path directly?

This sometimes changes across versions of Stata. We have seen this problem before, so we allow the user to fix it.

  1. Why would end users use this subcommand versus : di c(processors)?

That macro is not related to the number of processors available, but the number of processors that Stata is using, which is not the same.

  1. About column wise operations.

Sure, you can create your own implementation of parallel that does that. We will add it to the wish list.

  1. Add a short paragrpah of the API.

Added.

  1. About how programs are exported.

This is for local programs that are available only on the current instance of stata. Mata functions work, but mata variables that are pointers won't.

  1. I do think that being able to use random.org is a pretty cool feature of the program.

Thanks!

  1. Example with expression expansion.

See the section "Subcommand examples"

  1. What is the default behavior when a user ...

  2. Description of an event.

  3. What is it that is saved in e(pll)?

We use this to check whether the sim or bs was parallel or not for replay.

  1. Exciment about the stata program to run windows on batch.

Thanks!

  1. How would this potentially effect estimation commands if matrices are not available once the child process finishes the execution of the commands?
  • Not able to use with regression. You can always store things explicitly.
  1. If you want to provide instructions for this you can use:
    \begin{lstlisting}
    . ado, find(parallel)
    . ado uninstall [#]
    \end{lstlisting}
    This will probably return more results than they would hope, but it at least could illustrate how they would uninstall things.

  2. Make sure to let the users know that if they are trying to follow the examples here sequentially they will need to drop the price2 variable first.

  3. This example failed to execute properly. I've saved the error log so it could be sent back to you and can add an issue in GitHub. (see example "Example Simulation" in LyX)

We were able to run this example. Please provide further information.

  1. It would be useful to include some simple example files that could be used to test/verify this subcommand/functionality.

We have some in our github site.

  1. It isn't clear how this would affect loops that exist within community/user contributed/developed commands. Is there some recommended refactoring that others could implement that would allow the internals of their program to take advantage of parallel when the loops exist within the main body of the user/community defined command?

We think we probide an example in section "0.3.2 Parallelizing a loop".

  1. Does parallel have any performance effect on programs and/or scripts that generate a non-trivial number of graphs? ...
  • We are focused on data analysis.

  • See example of sequential consistency. What could drive differences between serial and parallel.

  • McCoach et al. (2018): Minimum amount of benchmarking.

  1. Similarly, someone recently gave a talk at Juliacon 2018 about performance benchmarks between Julia, Stata, R, and maybe Python ...
  • That is a nice idea. Perhaps in the future we could try to build something like that. The problem is that, since we are already looking at embarasignly parallelizable tasks, most other languages have that, so the comparison is trivial.
  1. It would probably be good to put the project website in parentheses so interested readers can go directly to the site instead of searching for it.

  2. This example will fail because the global $size is not defined prior to being referenced in the example.

Good catch.

  1. I think the second row in the table may be more confusing than helpful...

We modified the table, still have issues doing an ANOVA. DISCUSS.

  1. If the simulation is uninteresting ...

Removed

  1. I think the conclusion is nice, short, and too the point...
  • Mention the programs and wiki page.

Added content around:

  • What is the default behavior when a user ...
  • ado uninstall

Otherwise looks great. I'll accept the PR