- Date and time: 3rd October 2018, 10:00 - 12:00
- Location: Drosophila Connectomics Group - Jefferis lab, Department of Zoology, University of Cambridge, UK
- Please bring your laptop or mobile device if you'd like to follow the practical part
Morning session (Sergio and Mark):
- Background and motivation
- What is version control? What is Git? What is GitHub?
- How can you use Git and GitHub? How can they be useful for you?
- Practical session: working with Git and GitHub
Afternoon session (Mark and Anne): markdown, pages and wikis; creating good README files; issue tracking; sofware licenses
-
When repeating / reviewing previous work, researchers greatly benefit from having access to detailed documention of the methods used.
-
Keeping track of the different versions of your project is one more way of being more reproducible.
-
Some of the manual approaches to version control have clear limitations:
(http://phdcomics.com/comics/archive_print.php?comicid=1531)
- The scientific community is beginning to consider the value of peer reviewing computer code - see Nature Methods August 2018 editorial Easing the burden of code review:
An increasing share of modern research relies on analytical code and software. In turn, a good deal of irreproducible research can be attributed to computational tools that are difficult to decipher, use or recreate. Through the concerted efforts of computational researchers and stricter guidelines from publishers, the culture of scientific software is now more open and geared toward dissemination than ever [...]
- GitHub is becoming the go-to site when it comes to releasing / sharing the code associated to a manuscript or the scripts developed within a project.
Version control is the management of changes (a.k.a. revisions) to any types of information
- Simple versioning: adding v1.0, v1.1, v1.2, v2.0 ... to file names
- Basic tools: Google Drive, Dropbox ...
- Advanced tools: Git
The first version control systems were created by groups writing software and code. Fortunately they can now be used not only for computer code but for any type of files 😄
(adapted from http://lhzuigao.com/309note.html)
Advantages of distributed (right) over centralised (left) version control systems involve:
- If the central repository (server) crashes, it could be recovered / backed up from any of the local repositories created e.g. by the researcher, collaborator or group leader.
- Each person can make changes to their local repositories offline. Then integrate their individual changes in the central repository (server) when connected online.
Git is a distributed version control system to keep track and compare the history of changes made to your scripts and files. It allows groups of people to work on the same documents at the same time, and without stepping on each other's toes. It was created by Linus Torvalds in 2005 for the development of the Linux project. It is free and open source and helps you with:
- Creating repositories to host your projects using the command-line
- Tracking changes in the files and folders within your repositories
GitHub is a platform to share and showcase your work online with collaborators and the wider audience. A tool to help you build projects that are collaborative, well documented, and version-controlled. It provides you with:
- A place to host and backup your repositories online
- A nice web interface to your repositories
- A strategy to collaborate with colleagues
Versions in Git and GitHub are identified by a revision number, e.g. 60363b1, also known as commit. Each revision is associated with a timestamp and the person making the change. Revisions can be compared, restored, and with some types of files, merged.
There are other softwares for version control similar to Git, e.g. svn. Also, there are other online platforms similar to GitHub to share and collaborate code, e.g. GitLab.
The interfaces to Git and GitHub are:
- Via the command-line using git
- Directly online
- Github Desktop (available for Mac and Windows)
(https://programminghistorian.org/lessons/getting-started-with-github-desktop)
For this workshop, we will use Git commands and GitHub's online interface.
-
To host and share your research outputs and software
- Laboratories sharing own research e.g. the Jefferis or Balasubramanian groups
- Software source codes e.g. BWA, an aligner of DNA sequences to reference genomes, and PIDGIN, and algorithm to predict protein targets for drug-like molecules
- Or even to share research slides - see Bérénice Batut
-
To create websites using GitHub pages
- Personal research websites, e.g. Mike Love site
- Courses and activities, e.g.
-
To share the contents of a book, e.g. Bioinformatics Data Skills or Happy Git and GitHub for the R user
-
To write your PhD thesis, e.g. A reasoning framework for C4 photosynthesis research based on high-throughput analysis
-
Communication is key as most projects have both experimental and computational leaders
-
Building from the classical ways of sharing - conversations/meetings, email, Dropbox, shared folders ... we want to build an environment where:
- Computational colleagues can share code, figures and tables. Review others work and get credit from their collaborative work
- Experimental colleagues can follow computational developments, access results and learn methods of data analysis
-
And ideally avoiding situations like ...
(http://phdcomics.com/comics.php?f=1689)
- Parhaps a happier lifetime for a research project:
(https://github.com/semacu/20170703_GitHubintheLab_CRUK-CI)
If you want to start creating repositories in GitHub, your first need to open an account:
- Public repositories are free, and can be browsed and downloaded by anyone
- Private repositories have associated costs - see pricing of plans. The developer plan costs $7/month but it is free if you are a student or an academic
Alternatively, GitLab uses a different business strategy with free private repositories and cost plans for public ones. There are other alternatives e.g. Bitbucket.
GitHub uses Markdown for text edition, a language with plain text formatting syntax (bold, italics, checkboxes, lists, etc.), to render pages online (like HTML but easier). You can use this syntax in text files (.md), commit messages, issues, and more. Some examples of Markdown syntax are available here.
We have four possible tutorials:
- Create a GitHub account (+)
- Create your first repository (+)
- Explore your first repository and GitHub account (++)
- Making changes using Git in the command-line (+++)
- Go to https://github.com
- Fill in your Username, Email and Password. Then click on the green button "Sign up for GitHub".
- Choose your personal plan page. Select "Free plan" and then click on "Continue".
- Tailor your experience page. Choose the boxes that apply to you and click on "Submit". Otherwise, just go to "skip this step".
- You have created a GitHub account! 😄
- If you are not already signed in, sign in to GitHub using the Username/Email and Password created before.
- Click on the top-right "avatar icon" and select "Your profile". Have a quick browse through your page.
- Click on the top-right "+" icon and select "New repository". Verify your email address. You should have just received an email from GitHub in the address provided before. Find this email and click on "Verify email address".
- Create a new repository page. Fill in a "Repository name", e.g. my_first_repository or my_analysis_script. For now choose "Public" and select the box to initialize this repository with a README. Finally, click on "Create repository".
- You created your first repository! 🚀
- Click on README.md and go to the right pencil "Edit this file". Type anything to change the file, e.g. "GitHub is fun!".
- Scroll down. Introduce a commit change message, e.g. "My first update", and select the radio button "Commit directly to the master branch". Then click on "Commit changes". Voilá!
- To view your history of commits for README.md, click on README.md and then on the "History" button on the right.
- Alternatively, to view your history of commits for your first repository, click on the name of your repository and select the tab depicting a small clock and the number of commits next to it.
Bonus points (5 min):
- Try to create a new file
- In your new repository, have a look at the "Settings" tab, explore "Collaborators" and try to add the person sitting next to you.
- Click on your top-right "avatar" icon and select "Settings".
- Explore the tabs "Profile", "Account" and "Emails".
Key glossary:
-
Repository: it can be thought of as a project folder. A repository contains all of the project files, issues, wikis and more. It also stores the history and versions of each file.
-
Commit: equivalent to saving your changes to a file. When you commit you usually include a brief description of the changes you made so you can identify versions later if you want to undo a change.
-
Branch: an identical copy of a project at a particular point in time kept separate from the 'master' branch (primary copy). This keeps your code in the 'master' branch safe while you make changes and experiment with code on the new branch. You can merge your new branch back into the 'master' branch when you want to publish your changes.
-
Master: the default branch in your repository.
-
Collaborator: someone with read and write privileges to a repository as approved by the repository owner.
-
(If in Mac), go to Finder -> Applications -> Utilities -> Terminal and type
git --version
.- If you get as output something like
git version 2.5.4 (Apple Git-61)
, then Git is already installed -> Jump to the next section. - If you get something around
git: command not found
, keep reading.
- If you get as output something like
-
To install Git in Mac, follow one of the next strategies:
- When running one of the following commands
git --version
,git config
orxcode-select --install
you may be offered to install developer command line tools. Accept the offer and follow with "Install". - Go to https://git-scm.com/downloads and download git. Double click on the downloaded executable and follow instructions.
- If you have
homebrew
installed, type the following in the Terminal:brew install git
.
- When running one of the following commands
Example:
cd ~/Desktop
git config --global user.name "semacu"
git config --global user.email "sermarcue@gmail.com"
Remember to change "semacu" and "sermarcue@gmail.com" to the username and email you used when creating the GitHub account above.
Check:
git config --list
git clone https://github.com/semacu/my_first_repository.git
cd my_first_repository
ls -lh
Your first repository created using GitHub (my_first_repository) is now a local repository located in your Desktop folder. Remember what we discussed earlier about Git being a distributed version control system.
cd ~/Desktop/my_first_repository
git remote set-url origin https://semacu@github.com/semacu/my_first_repository.git
Check:
git remote -v
- In your Desktop, use Finder to go to the cloned folder and open
README.md
with your favourite text editor, e.g. TextEdit. - Change
README.md
, e.g. add a new line "This is my second line of script" and save changes. - Now, go back to the Terminal and check how changes are tracked by Git:
cd ~/Desktop/my_first_repository
git status
The status of README.md
is modified but the changes are not staged (red).
Staging:
git add README.md
git status
The status of README.md
is modified and now the changes are staged
(green) and ready to commit.
Committing:
git commit -a -m "My second update"
git status
git push origin master
Now check that your change to README.md
made to your online GitHub repository.
Bonus points (5 min):
- Make another change to
README.md
using the online GitHub repository and pull the change to your local repository (Hint: usegit pull
).
Key glossary:
-
Clone: a copy of an online repository on your local computer so you can make edits on your own personal copy without having to be online. You can sync changes between your clone and the remote copy (GitHub) when you are online.
-
Remote: a version of your project repository that is hosted on the Internet or network somewhere (e.g. copy of your project on GitHub vs. on your local computer).
-
Stage and commit:
(https://git-scm.com/book/en/v2/Getting-Started-Git-Basics)
-
Push: sends the recent commit history from your local repository up to GitHub.
-
Pull: grabs any changes from the remote GitHub repository and merges them into your local repository.
- Next steps for computational reproducibility, going back to the Nature Methods August 2018 editorial Easing the burden of code review:
[...] Yet, even in the era of Git repositories, peer reviewing code can be frustrating and time consuming [...] Computational tools are complex objects that depend on many components to run. Dependencies include the operating system, programming language, external code libraries, configuration settings and run parameters. Reproducing these conditions is made even harder by the fact that components typically exist in multiple versions. Many come with their own prerequisites, creating a maddening rabbit hole of dependencies on dependencies [...]
In other words, future steps will be to be able to execute code directly online (cloud). Two new resources are beginning to make a difference in this area - check them out 😉
- Code Ocean: Nature Methods, Nature Biotechnology and Nature Machine Intelligence have launched a trial to facilitate the peer review of computational methods and to improve their reproducibility
- Binder
Many Thanks for your attention! Enjoy Git and GitHub!
Feedback: please complete the following short survey
Any later questions about this workshop or the materials? Just email: sermarcue@gmail.com or mark.fernandes@cruk.cam.ac.uk
Blogs:
Books:
Courses:
- A Friendly Introduction to GitHub
- Software Carpentry: Version Control with Git
- Resources to learn Git
- GitHub On Demand Training
- A quick introduction to Git and GitHub
Help:
- GitHub Help
Papers:
- Nature Methods 2018 editorial, Easing the burden of code review
- Perkel 2018:
- Silver 2018 Microsoft’s purchase of GitHub leaves some scientists uneasy
- Russell et al. 2018 A large-scale analysis of bioinformatics code on GitHub
- Perez-Riverol et al. 2016 Ten Simple Rules for Taking Advantage of Git and GitHub
- Perkel 2016 Democratic databases: science on GitHub
- Markowetz 2015 Five selfish reasons to work reproducibly
Videos:
Websites: