inbo/n2khab-monitoring

Split this repository and revise names

florisvdh opened this issue · 3 comments

This is a follow-up from discussion in issue #2, related discussion in PR #21 and conversations with several collaborators.

It is still open for discussion, but the current plan is as below. The main points, splitting the repo and renaming the package, will be implemented in rather near future, probably in a stage with no pending pull requests.

package repository + name

  • while it has some advantages, keeping the package and the preprocessing workflow together has already proven to confuse several people. To prevent this, the package will move to its own repository.
  • the package retains its scope, that is: functions useful to a lot of projects dealing with Natura 2000 habitat (and RIBs) in Flanders, but not meant for too project-specific functionality.
  • the package's name (currently n2khabutils) will be shortened to n2khab. The 'n2k' prefix follows the style of existing n2k-packages at inbo.
  • existing code to reproduce the textdata delivered with the package, will be put into a build-ignored folder inside the package repo.

related n2khab repositories

  • what will be left in the 'n2khab-inputs' repo, at that stage, are two quite different things:
    1. preprocessing of data sources, again useful to a lot of projects dealing with Natura 2000 habitat in Flanders (not involving very project-specific stuff)
    2. information on intended and existing workflows in monitoring design & analysis and how this is to be organised as repositories. It provides task definitions in the context of monitoring (functionality.md + issues) which also involve work on the n2khab package and the preprocessing workflow - often side-to-side with work on other repos.
  • the above will be split into two repositories, respectively:
    1. n2khab-preprocessing
    2. n2khab-monitoring --> this one will serve as the specific starting point on monitoring, with information on the several involved repositories + package. It serves both MNE and MHQ, and it will contain nothing more than 'meta-information'. It would best present general information on the related repositories (not the involved tasks) as a website (using the gh-pages branch).

Warning: starting to implement this. Now is a bad time to make commits (even locally).

The following is now ready:

  • the R package has its own repo and is now called n2khab.
  • the repo n2khab-preprocessing, with code for broadly useful preprocessing of data sources for projects dealing with Natura 2000 habitat in Flanders.
  • the shrinked version of the former repo n2khab-inputs: n2khab-monitoring - i.e. this repo. It lays out purposes and workflows regarding several repositories, with a specific interest in Natura 2000 habitat monitoring. It's also the place where the related tasks and discussions are hosted, as before in n2khab-inputs.

The README files and other documentation in all three repositories have been updated to reflect each repo's respective contents. Minor updates of remaining obsolete pieces may still occur in the future. Main functionality should be OK.

The n2khab-monitoring repo has kept the complete git history of the former hybrid repo n2khab-inputs. As a convenience, the other two repositories still hold a rewritten (shrinked) git history, as defined by the selected files and folders. However full reproducibility of older workflows can only be guaranteed by the versions provided by the n2khab-monitoring repo.

In the n2khab-monitoring repo, the discarding of migrated files occurred in commit be6be8e, and some further related updates occurred thereafter.

How to move to the new n2khab R package as a user?

In R, uninstall n2khabutils:

remove.packages("n2khabutils")

Install n2khab:

remotes::install_github("inbo/n2khab", 
                        build_opts = c("--no-resave-data", "--no-manual"))

Have a look at the vignettes to quickly find your way!

help(package = "n2khab")
# vignettes only: browseVignettes("n2khab")

Help, I want to keep contributing!

Of course! Migration of local git repositories can be done super-easily.

You clone the new repositories n2khab and n2khab-preprocessing in the usual way, with git clone.

Supposing you still have a local n2khab-inputs repo, you have several options of migrating to the n2khab-monitoring repo:

The super-easy way

If you have no 'git-ignored' files to loose (such as data, bookdown reports) and no original work like unpushed branches, just delete the local n2khab-inputs folder and make a new clone of n2khab-monitoring. Done!

The geeky way

In the other case, you can decide to stay with your existing local repo. This means discarding obsolete branches, updating the remote reference, updating the local master branch and transferring previously git-ignored stuff to the repositories where they belong. (Note that a variant of this is to make a new clone of n2khab-monitoring and moving some git-ignored files from your old folder n2khab-inputs to the other repos (step 3). Less elegant, but it may be convenient to you.)

It goes as follows:

  1. To avoid confusion, rename the local repo folder n2khab-inputs as n2khab-monitoring (this is a matter of convenience)
  2. The rest can be done with git; in the shell:
# discard local and remote branches except the local master:
git checkout master
git remote remove origin # deletes origin and your obsolete pointers to remote branches
git branch -d <local-branchname-to-delete> # for unmerged branches: either keep them as such, or use the -D flag instead
# going to the new state:
git remote add origin https://github.com/inbo/n2khab-monitoring.git
git fetch origin
git branch -u origin/master # resets the tracking relationship
git pull
  1. Inspect remaining folders data and src. The remaining files therein are not longer ignored by git, but they don't belong here anymore: the current remote master does not have these folders! Except for src/generating_textdata, the html and other local files of which you can move to your local n2khab repo into the same folder, all other stuff should be moved to your local n2khab-preprocessing repo.
  2. Check git status that all is clean now. Done!

Note to self: how were some things done?

Regarding git, most ideas come from the manual of git filter-branch (see the examples); some inspiration is from github.

  • making a local clone from n2khab-inputs without hardlinked files
git clone --no-hardlinks n2khab-inputs n2khab # repeat for other repos
git remote remove origin # in the newly created repo; to discard all references to the local n2khab-inputs
  • limiting git commit history to subfolder contents (n2khab)
git filter-branch --prune-empty --subdirectory-filter n2khabutils master
git for-each-ref --format="%(refname)" refs/original/ | xargs -n 1 git update-ref -d
git reflog expire --expire=now --all
git gc --prune=now
  • discarding git commit history related to specific files and folders (n2khab-preprocessing)
git filter-branch -f --index-filter 'git rm -r --cached --ignore-unmatch n2khabutils' master # for folders; repeat for others
git filter-branch -f --index-filter 'git rm --cached --ignore-unmatch src/manage_package.R' master # for files; repeat for others
git filter-branch --prune-empty master
git for-each-ref --format="%(refname)" refs/original/ | xargs -n 1 git update-ref -d
git reflog expire --expire=now --all
git gc --prune=now
  • rename all occurrences of 'n2khabutils' as 'n2khab', in a directory

find . -type f -exec sed -i 's/n2khabutils/n2khab/g' {} \;

  • update specific paths in files in a directory

find . -type f -exec sed -i 's/..\/..\/n2khab\/inst/..\/..\/inst/g' {} \;

  • finding string in a directory, and only look into R and Rmd files:

find . -type f -name "*.R*" -exec grep -nH 'inbo.github.com/git2rdata' {} --colour \;