rstats-wtf/what-they-forgot

here() to access parent directories?

Opened this issue · 3 comments

Say I have my data stored in ~/Data, but my R project & code is stored in ~/dev/sample_model. Is it good practice to use here() to access this data, or should I hard-code absolute paths to it?

I suggest to create a symbolic link or a junction. Assuming that you are in your project directory:

# OS X/Linux
file.symlink("~/Data", "data")

# Windows
Sys.junction("~/Data", "data")

Then, you can just use here("data") to refer to your data. You need to provide instructions in your README.md on how to obtain the data and set up the symlinks.

@jennybc: Do you think this advice should be part of our documentation?

This sounds like a reasonable idea. To clarify, is this the principle? For a self-contained project, to reference files that (a) live elsewhere and (b) are not installed in a standard place, it is best practice to create a link inside the project.

As an aside, if ~/Data holds data that is used in more than one project, it is a good candidate for conversion to a data package. Which could then be versioned and installed and accessed in a more robust way, via library().

Confirming. I'd even create a link to directories in standard places, in this particular case we can put the link under version control (OS X/Linux only).

This principle also holds for other ways to access data (web APIs, databases, ...); accessing the data through a "link" (URL, connection string, ...) feels better than copying the data into your project.

Data packages have their advantages, but I find packages that simplify access to external resources more useful than packages that contain a copy of the data.