nfdi4plants/nfdi4plants.knowledgebase

[guide] handling ARCs across clones / machines

Opened this issue · 1 comments

Handling ARCs across machines (laptop, desktop, server, hpc, etc.) is not intuitive.

We need a practical guide / recommendation, e.g.:

  1. DataHUB as "original" clone with all data
  2. Laptop / desktop for small data, to create the ARC, for structuring and metadata annotation, creating scripts, etc.
  3. Clone to server / HPC (ideally via git) where large data is stored for computations
  • Needs to include work with LFS, git / ARC commander handling.
  • Reminders about arc syncing (keep clones up-to-date)
  • notes on working with branches

I think the above are a great basis for recommendations!
I have worked on an ARC across machines and as suggested using git pull worked well to pull the changes made on the respective other machine, so this could also be a good general recommendation to use regularly.

However, this was before I added the raw data to the ARC. Now I am wondering how to e.g. synchronize my laptop without pulling all those large files. If we find a solution, this would also be an important recommendation.