data-lessons/librarycarpentry

Figure out a workflow for getting DOIs for regular LibCarp releases {META}

Closed this issue · 21 comments

moved here because it makes more sense!

I like the idea of creating (say) an annual release of Library Carpentry, with a DOI, et cetera. And I've used the Zenodo-Github link to do this with projects in the past. Looking at my Zenodo account, I making a release for each lesson so this is possible, though I'd rather do a big bundle per Programming Historian https://zenodo.org/record/30935#.V8VN9I78_6h. Does anyone have any thoughts on this? What I guess I'm proposing is that we do a "version x" release of all the Github repos combined (in some way..) on a semi-regular basis. This will need to be managed to ensure: 1) it is done at the regular intervals 2) everyone involved in credited correctly 3) the metadata is put somewhere public for reference.

cc @jt14den

This sounds reasonable and I'd be happy to help. I've only used Zenodo on a per repository basis via the GitHub integration. How did you bundle the PH site? We might want to use semantic versioning to give flexibility. Maybe a semiannual release would work?

This is a good idea. Software Carpentry has done DOIs for each lesson, and I think some of the motivation there was to be able to give authors credit for the lessons they've contributed to, and potentially more than one reference if they've contributed to multiple lessons. It is also is easier to get a DOI for an individual repository. https://guides.github.com/activities/citable-code/

If you wanted to bundle them though, you could probably have a aggregating github repo, like we do for the ecology lessons https://github.com/datacarpentry/ecology-workshop/ (although that particular formatting still needs some work).

Thanks both. I'd rather a bundle as it is easier to manage. Credit for contributions - for now at least - feel for me best handled through other mechanisms (indeed, we are working on this at the moment! #8 )

My idea was to just collect the relevant repos - git pull et cetera - and zip them up, then add metadata on Zenodo. We lose some of the project that way (all the issues), but I'm not sure the Zenodo-GitHub repo versioning captures that anyway (?)

@drjwbaker and @weaverbel: @gvwilson popped into our Zoom on the second day afternoon PDT and we had a short chat about how great the sprint was going. We also talked about the value of making releases and getting DOIs after a sprint to capture contributions and give credit. I noticed that there's an active repo in SWC for semi-automating releases to Zenodo https://github.com/swcarpentry/swc-releases. How about we fork that work over and see if we can get it working for us? I'd be happy to help get this set up.

I'm all for giving credit where credit is due, all for versioning with DOIs, and love the Github/Zenodo functionality. Let's do it!

@drjwbaker I'm working on this -- want to include Library Carpentry in the swc-releases workflow. I pinged the repo to see how we can incorporate LC into that workflow.

@jt14den Thanks Tim. Do report back as and when.

@drjwbaker and @weaverbel: I've altered swc-releases for library carpentry and tested the workflow in my own fork and Zenodo account. The scripts work and do things in two stages:

  1. Creates a deposit per chosen lesson set, with authors, metadata from the repo, acquires a DOI and uploads a zip in Zenodo (these are currently in a draft state in my account, but all elements are there -- I will delete these in favor of recreating these in a librarycarpentry account in Zenodo). Releases are dated: so 2017.06 will be our first release.
  2. Creates branches, builds lessons and makes submodules of each lesson based on the deposited version - You can see what they look like here: http://www.tim-dennis.com/swc-releases/2017.06/ (notice the version and DOIs top-right at the lesson level -- don't worry they aren't registered, just pre-reserved). This approach lets us reference past lessons like SWC: https://software-carpentry.org/lessons/previous/

Next Steps (following the pattern established by SWC)

  1. I suggest we fork the swc-releases into https://github.com/librarycarpentry -- I'll need to get added as a member on that organization to do that.
  2. We need to create an account in Zenodo with a username of: librarycarpentry. I will need an application API key created for the script to function. I'm happy to set this up and share around the logins. We also can link the account to a Library Carpentry community (I'm currently sitting on that namespace), but can recreate under librarycarpentry.
  3. I need to know what lessons are ready for publishing! I tested on library-data-intro, library-openrefine, library-git, and library-shell.
  4. We'll need to rework the AUTHORS file to reflect the contributors to the lesson and what the script expects (firstname lastname). For my testing, I've done this in my account https://github.com/jt14den?tab=repositories - You can see what it looks like here: https://github.com/jt14den/library-data-intro/blob/gh-pages/AUTHORS (got this information from git shortlog -ns -- notice it also includes authors that worked on the template -- this is the same way it works on SWC). I can submit PRs for this.
  5. Once I run the script, it is suggested we actually publish them in Zenodo manually (all lessons will be in draft mode). This will give us the opportunity to make sure they look right, etc. We can also work iteratively -- correcting elements sourced from the script.
  6. In some period (6 mos.) we repeat! One good aspect on using the script is that @twitwi has been working on it and we can benefit from his work (I wonder how Zenodo supporting DOI versioning will change any of the publish assumptions).

What do you think? The published version will look something like: https://zenodo.org/record/278222#.WUhlJBMrJTY

👍 - this is great.

Marvelous work @jt14den. Thank you so much. I have added you as an owner. A few things:

  1. I think you mean https://github.com/data-lessons as that is where the lessons are (though I've added you to https://github.com/librarycarpentry so you can do website work if you want to :) )
  2. Okay.
  3. Core and Beta? http://librarycarpentry.github.io/
  4. I will ask all the lesson maintainers to check the automated output (if people didn't make edits it won't get captured by this, right?). This could be a bottleneck.
  5. Okay.
  6. Okay.

About filling the authors, there is a bash script in the swc-releases repository (authors.sh) that tries to help managing AUTHORS across many repository. It enriches/relies on an obfuscated global mail-map file https://github.com/swcarpentry/swc-releases/blob/gh-pages/all-mailmap

The script is less documented that the rest of the process but it can prove helpful.

Also, I recently made it remove people that contributed only to the style repository swcarpentry/swc-releases@6c7c1a7 (this new feature has not been used for a release yet, I used it to experiment with generating a shorter bibtex).

Thanks for your input @twitwi!

@drjwbaker Thanks! I'm cycling back to this now. A couple of comments related to your numbered responses above:

  1. If we are keeping the website in https://github.com/librarycarpentry, I suggest we publish the lessons there. The releases won't need to be in the same GH organization as the lessons. The submodules will be only references to the branches that are the published version of the lessons in the data-lesson repos.
  2. cool
  3. good by me
  4. I'll check out @twitwi's script. I missed it totally. I'll also check out the version that removes the authors who only contributed to the styles repo. I can make PRs back to the repos if all works as anticipated -- unless maintainers have already updated their AUTHORS file.

Thanks @twitwi for the scripts!

I should be working on this this weekend! Hope to have done shortly.

On 1., to be totally clear: doing this won't require us to move the lessons from https://github.com/data-lessons, correct? That would be a BIG job..

@drjwbaker no need to move lessons. git submodules are a way to have external repositories show up as a sub-directory of another git repo. it's like a symbolic link in unix.

Almost there with making things happen. Reconciling authors was a bit of a bear, but should be easier next release. The next step is to run the script to make branches in lesson repos and submodules. Maintainers will then need to review authors on that branch, give corrections (I'll ping them on the issues you created). Since, the releases are dated/named based on yyyy-mm (2017-06) and looking at the calendar, I'm wondering if we should change the release to July (2017-07) and not June?

Okay. Thanks for your hard work @jt14den. Yes, July release sounds better.

Cycling back to this. I know you've been busy @jt14den. Is this in your plan for the Autumn?

SWC and DC do releases, and @twitwi has a system that should work for LC lessons too. Blog post on the last SWC release https://software-carpentry.org/blog/2017/08/release-2017.08.html

Also, before the first release of new lessons Data Carpentry has been doing an Issue Bonanza and Bug BBQ http://www.datacarpentry.org/blog/lesson-release/ so that could be something to consider, although I know LC has already had some lesson hackathons.

A LC lesson release is something we could discuss on the October calls.

No worries these things take time! Thanks for taking the lead @jt14den.

If at some point you need help with how the authors script works, let me know.