simpeg/aurora

[JOSS Review] Suggestions and improvements for software paper

Closed this issue · 11 comments

(This issue is part of the review process for a submission to JOSS.)

Overall, the software paper (in the version linked here) is well-written and structured and provides a good starting point for assessing the scope of aurora and how to use it. However, there are a number of typos, ambiguities and inconsistencies that I feel hinder readability and understanding.

First, regarding the content:

  • L 87: "results are similar": What does this mean? If they are "only" similar, where does the difference come from? Can you link to the results of this test?

Typos and errors that should be corrected:

  • L 21-22: Any reason why "Tabular Data" and "Processing" is capitalised?
  • L 29: missing comma after "orthogonal"
  • L 33: superfluous "involves" or "requires"
  • L 34: superfluous comma after "timestamps"
  • L 36: missing end of sentence
  • L 60: superfluous period after "parameters"
  • L 71: "the flow of Fig. 2" -> "the flow described in Fig. 2" (or equivalent wording)
  • Fig. 2 caption: "transfer functions can similarly be exported to the most common…", right?
  • L 80: "ongoing … is ongoing"
  • L 95: "Aurora'a" -> "Aurora's"
  • L 103: "to co-develop" -> "to be co-developed"
  • L 120: reference should be in parentheses, period missing

Other (editorial) remarks, questions, suggestions:

  • L 46: "a sort of continuity"; what is a "sort of" continuity? (Specify or rephrase)
  • Fig. 1: Is the screenshot from the config file distorted?
  • Fig. 2: It's nice to include example plots here, but I think these are far too small. If you want to keep the small size, I would suggest to strongly simplify the figures or even replace them by sketches. If you want the actual output, I would suggest letting the flow go from top to bottom and use larger figures (and ideally avoid overlapping labels etc). Also, the regression figure is lacking labels altogether.
  • Fig. 2 & 4: These have strongly varying font sizes, which should be adjusted to be more consistent across the whole paper.
  • L 61: the "(supporting the R…)" remark is, I imagine, mysterious for people who don't know about FAIR. Perhaps add a reference? Currently, this seems more like a footnote and might as well be removed; if you want to highlight reproducibility aspects of Aurora, perhaps make this a full sentence?
  • L 73/74: Perhaps rephrase the first two sentences to improve readability? (avoid "This… This…")
  • L 74/75: "from the EarthScope Schultz (2010)" – I think this would be clearer if you would directly say "EMScope dataset".
  • Fig. 3: Any chance to embed this as Markdown Code with Python Syntax Highlighting? I would find this preferable over a screenshot.
  • Fig. 3 caption: Use monotype for code snippets like RunSummary(). Also, was this intended to be a list? The — seem to suggest so … but I'm not sure. Would propose to delete those.
  • Fig. 4 caption: Perhaps it would be good to mention that this is the example output resulting from the code shown above?
  • L 89: "also run via GitHub actions." … perhaps append "to assert functionality" to that sentence?
  • L 102: "(most test data is sampled at 1Hz)", to me (not a domain expert), this remark seems overly specific here.
  • L 115ff: The appendix feels a bit unnecessary. I think the installation instructions are best taken from the repository page. Information about documentation and license should ideally be included into the main text at a suitable point (don't need to be separate sections).
  • L 128: A ## References header should be added before the list of references starts.

And some more subjective suggestions (not affecting approval of the manuscript):

  • L 35: Consider adding a citation for xarray
  • L 44: "internal development" … the "internal" is perhaps not needed?
  • L 55ff: I would suggest to denote code, e.g. class names, with monotype font (KernelDataset, Processing, TransferFunctionKernel) to make clear that you are referring to parts of the aurora code. This should be done consistently throughout the text.

Hi @kkappler could you please have a look?

Inline Responses are here:

  • L 87: "results are similar": What does this mean? If they are "only" similar, where does the difference come from? Can you link to the results of this test?
    • "Similar" here means that the curves are basically interchangable, i.e. to first order one would interpret the same earth structure from them. Understanding the details of why particular curves differ is not a trivial process. A deep dive into that topic should be done, but would be a study on its own, deserving it's own report or publication. The issue of comparing results is further compounded by the fact that the existing TFs that are archived are not archived with the processing parameters that were used to generate the curves. Also, there are processing options in the FORTRAN code that are not available in the current python flow. Note that aurora was not derived directly from FORTRAN, but rather, Gary Egbert, the author of the EMTF FORTRAN code, translated it to matlab, and that matlab version was used as the starting point for aurora. I think this is an important fact to point out, and added a sentence We note that Aurora is two degrees separated from the FORTRAN EMTF, as we used a Matlab implementation of EMTF from Prof. Gary Egbert as an initial framework. to clarify this. [TODO Add Line number]

Typos and errors that should be corrected:

  • L 21-22: Any reason why "Tabular Data" and "Processing" is capitalised?
    • changed so that only the first letter of each bullet is capitalised
  • L 29: missing comma after "orthogonal"
    • comma added
  • L 33: superfluous "involves" or "requires"
    • removed
  • L 34: superfluous comma after "timestamps"
    • removed
  • L 36: missing end of sentence
    • modified to a full sentence
  • L 60: superfluous period after "parameters"
    • removed
  • L 71: "the flow of Fig. 2" -> "the flow described in Fig. 2" (or equivalent wording)
    • changed to "...the flow described by Figure 2..."
  • Fig. 2 caption: "transfer functions can similarly be exported to the most common…", right?
    • changed to "...transfer functions can be exported to the most common..."
  • L 80: "ongoing … is ongoing"
    • fixed
  • L 95: "Aurora'a" -> "Aurora's"
    • fixed
  • L 103: "to co-develop" -> "to be co-developed"
    • fixed
  • L 120: reference should be in parentheses, period missing
    • fixed

Other (editorial) remarks, questions, suggestions:

  • L 46: "a sort of continuity"; what is a "sort of" continuity? (Specify or rephrase)
    • Rephrased. This now reads:
      As a python representation of Egbert's EMTF Remote Reference processing software, Aurora provides a continuity in the MT code space as the languages evolve. We note that Aurora is two degrees separated from the FORTRAN EMTF, as we used a Matlab implementation of EMTF from Prof. Gary Egbert as an initial framework.
  • Fig. 1: Is the screenshot from the config file distorted?
    • There was some vertical exaggeration in the center panel, and some compression in the farthest right. This figure was binding a wide table with a tall json view. To remove distortion this figure was remade into two panels and the caption updated: Upper panel represents the TF Kernel with two inlay boxes representing the dataset (pandas DataFrame) and processing config (JSON). Lower panel illustrates example instances of these structures.
  • Fig. 2: It's nice to include example plots here, but I think these are far too small. If you want to keep the small size, I would suggest to strongly simplify the figures or even replace them by sketches. If you want the actual output, I would suggest letting the flow go from top to bottom and use larger figures (and ideally avoid overlapping labels etc). Also, the regression figure is lacking labels altogether.
    • Simplified figure 2 and added some explanatory text. These images are conceptual -- in reality the time series can have data from more than one station, and the spectrograms are also multivariate (not single channel as shown). The regression is also multivariate, and applied on complex-valued data from the spectrograms, this illustration however conveys the key idea of regression in the presence of outliers and mixed clusters.
    • Struck detailed text from Figure 2, so labels are just 'time', 'frequency', 'period', etc.
  • Fig. 2 & 4: These have strongly varying font sizes, which should be adjusted to be more consistent across the whole paper.
    • Font sizes have been updated.
    • For label boxes in Figure 1, fonts are 14.
    • For label boxes in Figure 2 fonts are size 12.
    • Axes labels and tickmarks for Figure 4 are size 12.
  • L 61: the "(supporting the R…)" remark is, I imagine, mysterious for people who don't know about FAIR. Perhaps add a reference? Currently, this seems more like a footnote and might as well be removed; if you want to highlight reproducibility aspects of Aurora, perhaps make this a full sentence?
    • the remark was struck from the text
  • L 73/74: Perhaps rephrase the first two sentences to improve readability? (avoid "This… This…")
    • rephrased, now reads:
    • This section refers to a Jupyter notebook companion to this paper (archived on GitHub: [process_cas04_mulitple_station](https://github.com/simpeg/aurora/blob/joss/docs/examples/process_cas04_mulitple_station.ipynb)). The companion notebook builds an MTH5 dataset from the EMscope dataset (@schultz2010emscope) and executes data processing -- a condensed version of which is shown in Figure 3. Apparent resistivities are plotted in Figure \ref{compareTFs} along with the EMTF-generated results hosted at EarthScope.
  • L 74/75: "from the EarthScope Schultz (2010)" – I think this would be clearer if you would directly say "EMScope dataset".
    • done
  • Fig. 3: Any chance to embed this as Markdown Code with Python Syntax Highlighting? I would find this preferable over a screenshot.
    • The code snippet is now embedded and the screenshot removed
    • Note this makes Old Figure 4 now Figure 3
  • Fig. 3 caption: Use monotype for code snippets like RunSummary(). Also, was this intended to be a list? The — seem to suggest so … but I'm not sure. Would propose to delete those.
    • RunSummary() is now RunSummary()
    • emdashes were removed
  • Fig. 4 caption: Perhaps it would be good to mention that this is the example output resulting from the code shown above?
    • Added text: The aurora results are from executing the example code snippet. The plotting details are in the Jupyter notebook
  • L 89: "also run via GitHub actions." … perhaps append "to assert functionality" to that sentence?
    • Done
  • L 102: "(most test data is sampled at 1Hz)", to me (not a domain expert), this remark seems overly specific here.
    • Removed
  • L 115ff: The appendix feels a bit unnecessary. I think the installation instructions are best taken from the repository page. Information about documentation and license should ideally be included into the main text at a suitable point (don't need to be separate sections).
    • After introducing the example code snippet, added the text:
    • To run the example you must install aurora, which can be done via conda or pip. Detailed instructions and further documentation can be found on the SimPEG (@cockett2015simpeg) [documentation website](http://simpeg.xyz/aurora/).
    • moved the licence information into the "Statement of Need"
    • deleted the appendix
  • L 128: A ## References header should be added before the list of references starts.
    • done -- used # References for consistency

And some more subjective suggestions (not affecting approval of the manuscript):

  • L 35: Consider adding a citation for xarray
    • Done
  • L 44: "internal development" … the "internal" is perhaps not needed?
    • struck the word internal
  • L 55ff: I would suggest to denote code, e.g. class names, with monotype font (KernelDataset, Processing, TransferFunctionKernel) to make clear that you are referring to parts of the aurora code. This should be done consistently throughout the text.
    • Done, also removed TFK and TFKernel abbreviations, replacing with TransferFunctionKernel

Hi @kkappler how are things going?

Hi @diehlpk

Thanks for checking in. This specific issue 333 has been addressed and is ready to merge pending feedback from blsqr.

Before requesting that feedback, we had thought to wait for the second review and incorporate those changes as well.

What do you think? Should we request blsqrs' review on this issue now, or wait for the second review?

Thank you for implementing the changes, @kkappler!

Could you perhaps upload a revised version of the paper proof PDF here (or merge and then generate a new proof in the JOSS review issue), such that I can inspect the changes in rendered form?

EDIT: Nevermind, found it here.

My issues regarding the software have been addressed and I will update the review checklist accordingly. Thank you, @kkappler!

From my side, the corresponding PR #335 can be merged. Once merged, could you please point me to the most up-to-date branch such that I can continue the review there? Is it #338 or #340 or some other branch?

Thanks @blsqr we have now merged all the review branches into the joss branch. That is the latest branch that relates to this manuscript.

@diehlpk

@blsqr could zou please have a look and check if all your concerns were addressed.

Thank you for implementing the changes, @kkappler!

Could you perhaps upload a revised version of the paper proof PDF here (or merge and then generate a new proof in the JOSS review issue), such that I can inspect the changes in rendered form?

EDIT: Nevermind, found it here.

@blsqr please note that the latest version of the draft manuscript is now merged into the joss branch and can be found here

Thank you, @kkappler!

I have two minor points remaining regarding the paper:

  • Fig. 3 caption ends with The plotting details are in the Jupyter notebook(.: remove superfluous (
  • Last reference, Vozoff K. (1991): proper citation is missing. I've found this book chapter, is that correct? https://library.seg.org/doi/10.1190/1.9781560802686.ch8 If so, please update this reference.

@blsqr Thanks for catching those errors. They have been fixed on joss branch (here).