Keith Hughitt 2019-05-15
- Overview
- Intro to version control systems (VCS)
- Why is VCS useful?
- Git Basics
- Installation
- Seven most useful git commands to know
- Creating a new repo (
git init
) - Adding files to a repo (
git add
) - Checking a repo's status (
git status
) - Saving changes (
git commit
) - Pushing your changes to a remote repo (
git push
) - Pulling changes made to a remote repo (
git pull
) - Downloading a copy of a remote repo (
git clone
)
- Creating a new repo (
- GitHub Basics
- Overview
- Why use GitHub?
- Single-user Workflow
- Multi-user Workflow
- Setting up the master and forked repos
- Making changes
- Accepting changes
- Other multi-user workflows
- Beyond just code
- Further reading
The goal of this tutorial is to familiarize the user with the basics of version control (VCS), Git and GitHub.
Of course, there are already numerous tutorials which do this and do a much better job than I could hope to do, e.g.:
- Git - gittutorial Documentation
- Try Git: Code School
- Set Up Git · GitHub Help
- GitHub For Beginners: Don't Get Scared, Get Started
I would encourage people to check these out as well.
Here I am just going to try and cover enough to get people started and hopefully interested enough to try it out and learn more.
Version control systems (VCS) are software tools used to track changes to a collection of files and directories and to aide in collaborative development. VCS is most widely used in the context of software development for tracking changes to code, but it can also be used to track changes to other types of work such as manuscripts, data, etc.
Some popular examples include:
- Concurrent Versions System (CVS)
- Subversion (SVN)
- Git
- Mercurial
Although the big picture is generally the same for each of these, and using any of them is going to be better than using none, there are some differences in the philosophy and function of each.
CSV and SVN were developed first, and are centralized version control systems. This means that there is a master codebase, and client hosts which "checkout" pieces of this code to make changes.
Newer VCS, including the later three listed above, follow a different approach called distributed VCS (dVCS). In this model there is no central repository -- all clients have an entire copy of the repository.
Both approaches have their advantages and disadvantages. The focus in this tutorial, however, is on one of the dVCS: git.
Some of the main uses for VCS include:
- Tracking changes (imagine not having an undo button in Word...)
- Backing up code or other files (Mirroring on GitHub, etc.)
- Experimentation (branches)
- Collaboration
In order to make use of Git and Github, you must first download and install the Git
client. Below, we focus on using on the command-line git
command. Depending on your
OS, there may also be a GUI
interface
to git that you can also use. Many modern integrated development environments (IDEs)
(for example, RStudio)
also include functionality for interacting with VCS tools, including Git.
Download and install Git from git-scm.com.
This is 99% of what you need to know to use Git:
- git init
- git add
- git status
- git commit
- git push
- git pull
- git clone
To create a new git repository, simply enter the root directory which you want
to make a repo and run git init
:
$ mkdir test
$ cd test
$ git init
Initialized empty Git repository in /home/username/test/.git/
$
$ touch foo.txt
$ git add foo.txt
It's always a good idea before making a commit to check the status of a repo
before making any changes using git status
:
$ touch newfile
$ echo 'Hello World' > newfile
$ git status
On branch master
Initial commit
Untracked files:
(use "git add <file>..." to include in what will be committed)
newfile
nothing added to commit but untracked files present (use "git add" to track)
$ git add .
$ git status
On branch master
Initial commit
Changes to be committed:
(use "git rm --cached <file>..." to unstage)
new file: newfile
$
Once you have done something interesting, commit
it!
$ git commit -m 'Important change #1'
[master (root-commit) 9c5205a] Important change #1
1 file changed, 1 insertion(+)
create mode 100644 newfile
$
Here, the -m
parameter is used to specify a commit "message" to associate with the changes you've
made.
Note that when you use the command git commit -m
, only the changes that you have stages (using
git add
) will be included in the commit. In order to include all changes made to files in the repo, you
can use git commit -am
. This will include all files already in the repo (i.e. previously added
using git add
) to the commit.
So, to recap:
- When you want to add a new file, use
git add <filename>
orgit add .
- When you want to save changes made to one or more existing files in the repo, use
git add <changed_file1> <changed_file2> ...
+git commit -m "message"
or,git commit -am "message"
to include all modified files.
Once you have committed some changes, you may want to sync them with a remote
repository such as GitHub. This is done using the git push
command.
$ git push -u origin master
Counting objects: 3, done.
Writing objects: 100% (3/3), 232 bytes | 0 bytes/s, done.
Total 3 (delta 0), reused 0 (delta 0)
To git@github.com:khughitt/test-repo.git
* [new branch] master -> master
Branch master set up to track remote branch master from origin.
$
Note that for this to work, you must first create a remote repo and add a reference to it. We will come back to this part later...
If your repo is hosted on Github and this is the first time you are pushing changes from the computer you are using, you will also need to add a public SSH key for that computer to your Github account.
Once you start to collaborate with other people, you will need a way to sync your repo when other people have made changes to the shared repo.
This is done using the git pull
command.
$ git pull
remote: Counting objects: 4, done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 3 (delta 0), reused 0 (delta 0)
Unpacking objects: 100% (3/3), done.
From github.com:khughitt/test-repo
9c5205a..1336440 master -> origin/master
Updating 9c5205a..1336440
Fast-forward
README.md | 3 +++
1 file changed, 3 insertions(+)
create mode 100644 README.md
$
Finally, you may come across code or other files hosted in an online repo (usually on Github) that
you wish to download and possibly make changes to. The command to do so is git clone
:
git clone https://github.com/khughitt/labnote
Cloning into 'labnote'...
remote: Counting objects: 763, done.
remote: Total 763 (delta 0), reused 0 (delta 0), pack-reused 763
Receiving objects: 100% (763/763), 1.90 MiB | 3.95 MiB/s, done.
Resolving deltas: 100% (382/382), done.
The above command downloads the khughitt/labnote repository
from Github over https, and stores it on your local machine. By default, it will be saved in your
current working directory in a directory with the same name as the repo (here, labnote
).
GitHub is a free online mirroring service for git repositories. It hosts mostly open source code, although you can also pay to have "private" repositories.
- Backup your code
- Share your code
- Collaboration
- Online code viewing/editing
- Browsable commit history
- Integrates with other services
- Renders Markdown
- Host websites (e.g. NLM 2018 REPRODUCIBILITY WORKSHOP)
- Host HTML5 presentations
- Host R packages (devtools::install_github)
- Host Python packages (pip)
- Repo statistics
For small projects or scripts that you would like to track and/or share on GitHub, the process is very simple:
- Create a repo on GitHub
- Follow steps to clone repo and add repo as an upstream remote
- Hackity-hack (keep it atomic)
git commit
git push
- Repeat steps 3-5.
It is also not a bad idea to add a README.md to the repo with some notes to yourself or others (same as README.txt.)
The process for collaborating with other users on a project using Git and GitHub is similar to the single-user workflow described above, with a couple additional steps along the way.
- Create a repo on GitHub (do this once)
- Fork the master repo (each user does this)
- Follow steps to clone the forked repo and add repo as an upstream remote (each user does this)
Next, once a repo has been created and each user has their own fork of that repo, the process each user follows to make changes is the same:
- If master repo has changed, used
git pull
to merge changes into fork. - Make changes
git commit
git push
- Submit a pull request
Once a pull request (PR) has been submitted, it will appear on the master repo. The PR will list all of the commits made, files changes, and any information the user submitting the PR provided about the PR.
If this all looks good, then any user who has privileges to the master repo can "accept" the PR, and the changes will (usually) be automatically merged into the master repo.
There are other workflows that can be used for collaboration on GitHub -- the above just illustrates one of these which I am partial to.
For larger efforts, you can also create teams on GitHub so that an entire team owns or manages a repo instead of a single user.
One of the nice things about Git and GitHub is that you are not limited to using it for just code. Some other useful things it can be used for include:
- Documents
- Websites
- Images
- Etc.
If you want to learn more, there are a lot of other great tutorials on Git and Github as they pertain to science. Here are just a few examples to help get you started:
- A Quick Introduction to Version Control with Git and Github (PLOS Bio)
- Ten Simple Rules for Taking Advantage of Git and Github (PLOS Comp Bio)
- Making Reproducible Research Enjoyable
- Electronic lab notebook - The stupidest thing...
- Git can facilitate greater reproducibility and increased transparency in science
- GitHub for Academics: the open-source way to host, create and curate knowledge
- Version control for scientific research
- A quick introduction to Git and GitHub (Data Carpentry for Biologists)
lorem ipsum lorem ipsum lorem ipsum
- https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet
- https://help.github.com/articles/github-flavored-markdown
Note: This tutorial was adapted from an earlier version originally presented at a UMD bioinformatics club meeting in January, 2014.