NOAA-GFDL/MDTF-diagnostics

Correct and condense installation instructions in README

Closed this issue · 4 comments

Issue
The GitHub README file currently contains spelling/grammatical/formatting errors, misguided instructions, and is too long for a new or prospective user to easily navigate or absorb all at once.

Misguided instructions

There are two places in the README where we, arguably, tell users to do the wrong thing:

  • Sections 1.1 and 4: we don't want anyone to commit their personal config files as part of a pull request; the configuration step should unambiguously tell users to modify a copy of default_tests.jsonc, since that file is under version control.

  • Section 2.2, first bullet: Because conda is implemented as a function or alias in the user's shell, it's impossible for two conda installations to coexist. Installing conda in the presence of an existing installation will break the first, as we state in the last paragraph of section 2.1.
    If the user doesn't have write access to their site's conda installation (which is the real issue, not whether conda has been installed in /usr/), the workaround is to use the site's conda executable to install the MDTF conda environments in a location where the user has write access, using the --env_dir flag on the conda_env_setup.sh script.

Length

The GitHub README page, in practice, needs to serve an advertising/marketing function of reducing barriers to getting a prospective user to try out the software. If our page is a long, detailed list of installation instructions, the first impression we make is that our software is too complex to use.

If you look at the README page for other metrics packages (e.g., PCMDI metrics, ILAMB, ESMValTool), or more broadly any open-source package (e.g., xarray, NCL, cartopy, ...), it's very brief, taking up at most a few screens of text, and primarily serves to redirect the reader to more detailed information elsewhere. I propose that we do the same thing via links to the ReadTheDocs site.

Even if we don't adopt that policy, the following parts of the README aren't needed to describe the immediate task of installing and testing the package:

  • Creating new branches and pushing them to a personal repo in section 1.1;
  • The ASCII directory tree in section 1.2 is a rather verbose way to say "move directory X inside of directory Y";
  • Justifying the use of conda in section 2: explaining the "why" as well as the "how" should be done in a longer version of the instructions.
  • Instructions for installing individual conda environments in section 2.2;
  • Most of the configuration options in "4. Configure framework paths" (mis-numbered) are unnecessary: in section 1.2 we told the users to put downloaded data in the relative path we used as a default value.
  • Descriptions of the function of individual conda environments in section 4.3 (mis-numbered);
  • The explicit data citations for CM4 and ESM4 -- as mentioned, to be fair we should also cite CAM4, and it surely must be kosher to just link to the model DOI in the "Examples of package output" section rather than writing out the full citation?

Great points, @tsjackson-noaa.

  • We could introduce some git tricks to help users avoid accidentally committing files. I've used this method before, but I also wonder if it would end up being too confusing though for end users.
git update-index --assume-unchanged src/default_settings.jsonc
  • You make a very good point about the README.md page from an optics point-of-view. It does telegraph a message of complexity. Not only that, it is redundant information from the readthedocs site and requires us to maintain essentially two copies of some parts of the documentation.

I agree that the Readme file has become unwieldly in our attempts to meet the impossible standard of a short document with tons of details. The most recent package that I've installed from git, aside from MDTF, is mamba. The Readme is perhaps a bit longer than other pages, but has general structure that seems like it would suit our needs:

  • brief description
  • general installation instructions
  • additional features
  • development installation instructions
  • support
  • license

I agree that the Readme file has become unwieldly in our attempts to meet the impossible standard of a short document with tons of details. The most recent package that I've installed from git, aside from MDTF, is mamba. The Readme is perhaps a bit longer than other pages, but has general structure that seems like it would suit our needs:

  • brief description
  • general installation instructions
  • additional features
  • development installation instructions
  • support
  • license

This sounds good to me. My preference here would be to have content that is relatively static and will not likely need to be updated. In addition to this list we also need to include the appropriate Government disclaimers.

Closing this issue, as it's been resolved by @wrongkindofdoctor 's work on #188.