/GettingStarted

...with git and GitHub

Primary LanguageShellMIT LicenseMIT

Getting Started with git and GitHub

Please use the issues to post requests for more FAQ!

For a video tutorial that should (hopefully) get you from git newbie to being able to submit a pull request, please follow this YouTube link. The GitHub help pages are also very good.

FAQ


What is Git? And GitHub?

git is a versioning system, like svn but better. It allows you to work offline, committing changes to a local "clone" of the repository, and then pushing them to the remote repository when you get back to wifi. 

GitHub is a web service that hosts remote git repositories and enables collaboration via some nice tools. Repositories (or "repos" as they are known on GitHub) can be either public, enabling any of your colleagues to provide feedback or contribute to your project, or private, in case you need to make blind datasets or something. The LSST DESC has an "organization" on GitHub to keep its repos together in one place. It's nice. Here's the LSST DESC Organization homepage and here's an example of a repository that you can browse around in.

You will need an account on GitHub: follow this link and fill in the form, including your full name so that your collaborators can find you easily.

You will also need the unix command git to work on your local machine. 

Back to the tippety-top.


Slow down. What is a "versioning system"?

Ah, sorry. Imagine you are working on a document, and you want to save your old versions in case you want to go back to one of them if your plans change, or if your computer breaks down. You'd end up with a series of files called, for example, ms.v1.tex, ms.v2.tex, ms.v3.tex, ms.final.tex, ms.final2.tex, ms.submitted.tex and so on. A versioning system is a computer program that does this for you. It allows you to work on one file, ms.tex, while keeping track of all your old versions. It allows you to go back to them if you want. It also handles that situation where your collaborator makes some changes and sends you ms.v1.pjm.tex, after you have moved on to ms.v2.tex: it merges the two files together for you. Let's compare some basic usage of the git versioning system with your old way of doing things.

Manual versioning Git
mkdir old git init
cp ms.tex ms.v1.tex
mv ms.v1.tex old/
git add ms.tex
git commit -m "Initial version" ms.tex
edit: ms.v1.tex
save as: ms.v2.tex
edit: ms.tex
save
cp ms.v1.tex old/ git commit -m "Finished introduction" ms.tex
edit: ms.v2.tex
save as: ms.v3.tex
edit: ms.tex
save
cp ms.v1.tex old/ git commit -m "Added references" ms.tex
save ms.v1.pjm.tex from email
edit: ms.v2.tex and ms.v1.pjm.tex
save as: ms.v3.tex
git remote add phil http://phil.com/paper.git
git pull phil master
cp ms.v2.tex ms.v1.pjm.tex old/  
edit: ms.v3.tex
save as: ms.final.tex
edit: ms.tex
save
and so on. The magic command that saves you manually combining two files is "git pull phil master". What must have happened here is that your colleague Phil has obtained a copy (a "clone") of the "repository" that you made with "git init". (Before it was just a folder: git init turns it into a repository, with a hidden directory called .git that contains all your old versions - or rather, the differences between old versions - that git can use to reconstruct your past work). Then, Phil has put it on his webserver, so that you can access it remotely. The "git remote add" command links the two repositories (yours and Phil's) together, so that you can each pull in the edits that the other makes. ("master" is the name of the "branch" of the repository that Phil was working in - we'll come back to branches in a second.)

With git (and other versioning systems), the act of archiving your old version is called "committing your changes." It's good to do this often, so that you have more options as to which version to go back to if you need to (because you don't have to worry about out of control file proliferation any more, right?). When you do a git commit you get to make a comment at the same time, to summarize in a few words what you did in this editing round. These comments are summarized for you when you do a "git log". The output of this command looks something like this:

commit 95d6aad841215ce21472f68ef766ead9eabec1e7
Author: Your Name <your.name@emailprovider.com>
Date: Thu Jul 2 09:24:28 2015 -0700
Merge branch 'master' of phil.com:paper

commit 6c371b736abfb6fead8e15b378ead66675a313f0
Author: Your Name <your.name@emailprovider.com>
Date: Thu Jul 2 10:45:05 2015 -0700
Added references
commit 3c431b7236cdfb612ad8e15b378ead66675a32245
Author: Phil <phil@phil.com>
Date: Thu Jul 2 10:17:32 2015 -0700
Wrote method, results, discussion and conclusions
commit 6f43fe926fbb23d5c7bfc94ed0f7204387aef918
Author: Your Name <your.name@emailprovider.com>
Date: Thu Jul 2 10:00:05 2015 -0700
Finished introduction

commit 3264125999c663ac696f7338fc1252be5551a018
Merge: 06abaac 4853e0a
Author: Your Name <your.name@emailprovider.com>
Date: Thu Jul 2 09:59:47 2015 -0700
Initial version

Those horrendous hexadecimal strings are "commit IDs" - they are what you need to revert to an old version of your document. Actually, you don't need the whole string, just the first 7 characters. Suppose you want to go back and work on your old version (the one where you added the references but before you merged in the rubbish that Phil wrote). Here's what you would do:

Manual versioning Git
history git log
mkdir rewording
cd rewording
cp old/ms.v2.reworded.tex .
git checkout 6c371b7 -b rewording
edit: ms.v2.reworded.tex
save as: ms.v2.reworded.v2.tex
edit: ms.tex
save
mkdir old
cp ms.v2.reworded.tex old/
git commit -m "Better text than Phils" ms.tex

Instead of making a new folder (called, eg, "rewording") and working on a reworded version in it, with git you would make a new branch of the repository (called "rewording") and work on ms.tex there. The command for moving between branches (like changing directories) is "git checkout". The initial branch is called "master" - good practice is to use master for the current, best, working version, and all other branches for experimenting. 

Now, suppose you want to submit your version of the document to a journal. You talk to Phil, and persuade him that your text is better - not by emailing him your version, but by pushing your new branch to his repository (assuming he gave you permission). Then you can carry on editing in the master branch - which is like going back to your main directory:

Manual versioning Git
cd ../ git checkout master
pwd git status
cd reworded git checkout rewording
pwd git status
email ms.v2.reworded.v2.tex to Phil git push phil rewording
discuss, agree discuss, agree
cd ../
cp rewording/ms.v2.reworded.v2.tex ms.final2.tex
git checkout master
git merge rewording
edit: ms.final2.tex
save as: ms.submitted.tex
edit: ms.tex
save
cp ms.final2.tex old/ git commit -m "Formatted for journal" ms.tex

Hopefully this shows something of how git makes keeping track of your changes much simpler. You only ever edit one file, and you only have to do minimal manual editing to merge changes from multiple collaborators ("conflicts" between different versions of the same files do arise, but only when the same lines of the file have been edited, and so they are usually easy to fix - certainly much easier than merging two versions by hand in an editor). Branches take a bit of getting used to: a git checkout can make your current working directory look very different, unlike any other unix command you use! But thinking of it as being like "cd" is helpful. The "git status" command is incredibly useful: it tells you which files have been modified since the last commit, if there are any files that have not yet been added to the repository, if any files have been deleted since the last commit, all as well as which branch you are on.

As you might have guessed, git pull is actually a shortcut to two commands one after the other: git fetch (to get any new commits from the remote repository) and git merge (to merge the files in the remote branch with the current local one). Unlike with doing things by hand, it's actually quite hard to over-write files and lose work. Git will not let you pull in other people's changes until you have committed yours, and it will not let you push your changes to a remote repository until you have first pulled its changes in and merged them. And finding old versions by your commented history is much easier than trying to remember the meaning of your own filenames!

Back to the tippety-top.


Who am I? And how did I get here?

Your name should be written on your "profile" page, which you can reach by going to the GitHub home page and clicking on the little icon in the very top right hand corner of the page. It's a good idea to enter your full name (and preferably some other public details about yourself) so that people can find you and communicate with you on GitHub.

You are here because git and GitHub are incredibly useful research tools, that are well worth your time learning.

Back to the tippety-top.


How do I contribute to a project on GitHub?

If you have been given write access to a GitHub repository, you can "clone" it to your local machine and start work. If you have not, you can still contribute by making a "fork" (there's a button for this in the top righthand corner of the GitHub page for each repository). This will make a copy of the repository in your GitHub account, that is linked to the "base repo" - you can then clone from your fork to get the project onto your local machine.

To clone a repo, look down the right hand sidebar of its GitHub page. You should see "http clone URL" and a clipboard icon next to it. Under this there is the "SSH" option - select this, and then click on the clipboard. You now have the address of the remote repo in your clipboard. Go to your terminal, and cd to the place where you want your copy of the repo to live (it has its own folder). Then do "git clone <paste>" and hit return.

When you first do this, it will fail. Read the message! Git error messages are almost always very helpful. This one says that your ssh keys need to be set, so let's do that. Go to your profile (the very top right hand corner of the GitHub window, there should be a picture of you) and choose "settings". In the resulting list is an entry called "SSH Keys" in the left hand side bar. Go here and paste in your public SSH key. This enables GitHub to let you upload files to its server over SSH without typing your GitHub password all the time. If you don't know what an SSH key is, the help links on the SSH keys page you are on are pretty helpful.

Now repeat the git clone command and you should see a local copy of the repo appear.

Back to the tippety-top.


How do I get the latest version of the repository?

This is typically in the master branch of the base (original) repository, so, after doing a "git status" to make sure you are in the right branch, do "git pull origin master".

If your local repo is a clone of a fork, you'll want to connect it to the base repo with "git remote add upstream ownersname:reponame.git", and then you can pull in changes from the base repo with "git pull upstream master". Don't forget to do "git status" before you pull.

Back to the tippety-top.


How do I commit my edits?

Git has a commit command, just like svn: mostly you will use it as follows: git commit -am "comment"

The '-a' commits all changes. You can see what you are about to commit by doing 'git status'. In fact, you should do a 'git status' before doing anything - it shows you which branch you are on, which files have been added, deleted, modified and so on.

After committing, your edits still only exist in your clone of the repository. To share them with other people you can push them to any other remote repository you have push access to - most commonly, the remote repository at GitHub. When you cloned the repo to your machine, git set up the GitHub repo as your default remote, with the name "origin". After you have committed your changes, you should then do 'git push origin master' - which means "push my work to the master branch of the remote repository origin".

Git will not let you push to a remote repo until you have first updated your local clone with any changes that have been made in the meantime at the remote repo. If you get an error that says as much, do a 'git pull origin master' to pull down the changes from the master branch of the remote repo (named "origin"). 

To see all the remotes that you have access to, type 'git remote -v'.

Back to the tippety-top.


I git pulled and now I have a conflict. What do I do?

Fix it. The error message tells you which files contain the conflict. Open them in an editor and search for the string '>>>>>>'. Just like in svn, the portion of code between this string and the '======' mark is the remote version, while the portion below it and above the '<<<<<<' string is your local version. Edit the file so it is correct. Then, to resolve the conflict in <filename>you 'git add <filename>' before you then git commit to save your changes. You will also want to push your change to the remote branch on, for example, a hosting service like GitHub.

If you find yourself fixing complicated conflicts often, you may want to learn how to use a mergetool to compare the differences. A more involved tutorial can be found here

Back to the tippety-top.


I want to delete a file. How do I do that?

Just rm it as usual, and then do 'git status'. You'll see that git understands file deletion: when you commit all your changes, git will stop tracking that file. You'll still be able to access old versions of that file in the repository, though.

Back to the tippety-top.


I made some edits that I don't like and want to go back to the original file. What do I do?

If you haven't committed your edits you can just git checkout – <file> and you will get back the original file. Be warned that your edits on this file will be lost (it will be overwritten)

Back to the tippety-top.


What's the best way to make a new repository?

You can make repos on your own GitHub home page, with the big green "New repository" button. If you are in a GitHub organization, you need to be given admin access to be able to create repos there. Here's the LSST DESC GitHub organization if you want to see what an organization looks like.

To turn one of your existing folders into a git repository, just do "git init" and then start git add'ing files. If you later want to push this to GitHub, you'll still need to start a repo on the GitHub site - just don't initialize it with a README or anything, just start it and then pick up its address (the thing that ends with ".git"). Then, on the command line, add a link to this new remote repository with "git remote add origin <address>". Then you can push to it as normal. More instructions here.

It's best to initialize a repo with a README (so you can tell people what the project is about) and a license file (so everyone is clear about what you are happy for people to copy and re-use) but you don't have to. A .gitignore is useful though - it tells git to ignore certain files and filetypes, so that they don't clutter up your git status messages. Once the repo has been started, you can then clone it to your local machine.

In the repo's settings, at the bottom of the righthand sidebar, you can add collaborators (giving them read, write or admin access), and turn on the wiki associated with the repo, if you want.

Back to the tippety-top.


How do I push and pull without having to type my password all the time?

You can give GitHub your public SSH key instead. See the instructions above

Back to the tippety-top.


What is a GitHub "issue"?

#To watch the video, click here.

When coding, many issues arise that need to be addressed: bugs, new features that you want, questions you have about the documentation and so on. When you have identified an issue, you usually want to do two things: 1) make a note of it so you can deal with it later and 2) tell your collaborators about it. GitHub issues do both.

To start a new issue, go to the circle with an exclamation point inside it in the repo's right hand sidebar (right under "code" and above "Pull requests").  Then, hit the big green "New issue" button, give it a title (like the subject line of an email, summarizing the issue) and if necessary, a short description of what needs to be done - and when you hit submit, the issue is added to the repo's list, and a notification email is sent to everyone who is "watching" the repo. #This is a Good Thing: you want to be able to keep up with your projects!

You can give making issues a try at on this very repo . To "watch" a repository, and hence follow (all) its issues, click on the "Watch" button in the top right hand corner of the repo's page.

Any other GitHub user can watch your repo (and hence follow its issues), as long as it is public not private.  They can also submit issues. This is a Good Thing: it provides a means for anyone to give you feedback about your project, and lets everyone know what you are working on so they can avoid wasting their time duplicating effort.

Private repos also have issue lists attached to them, but only the people in that repo's collaborator list can see them. To adjust the private/public nature of a repo,  and adjust its collaborator list, go to the repo's "settings" via the spanner/screwdriver icon in the right hand sidebar.

Back to the tippety-top.


Argh! How do I stop getting all these GitHub notification emails?!

Issues are a great way to communicate: they keep topics well separated, and allow the repo's project to be tracked well. However, the flood of notifications emails that using GitHub produces (one for every comment on every issue thread) can seem overwhelming. Below are some tips for how to follow repos effectively.

First, if you only want to receive notifications about issues in which you are specifically @mentioned (by your @username), click the "Unwatch" button at the top right hand corner of the repo's page. "Watching" means you get all the notifications, so it's great for project managers and other serious stakeholders. "Unwatching" is often a good choice for developers.

When watching a repo, you can still manage the notifications you see in your Settings. Filtering your email is an effective strategy: you can label/redirect GitHub messages by sender or repo name, but by whether you are @mentioned (by your @username) in the message.

All of the above works best if your team uses the @mention feature well. A good rule of thumb is that you should assume that only the people who are @mention-ed in an issue will get an email notification. Following this rule will enable everyone to filter GitHub's emails with less concern about missing something. Note that in an organization, you can @mention teams as well as people - and that the auto-complete is pretty intelligent (just start typing the team name after the '@' sign).

One last thing: because GitHub issues are usually well-separated by topic, you can very often skim and archive their notification emails quickly. This can be very satisfying if you love rapidly clearing away emails so you don't have to look at them any more.

Back to the tippety-top.


What is a "Pull Request"?

Suppose you see something that needs fixing in a repo's code. Here's a good way to go about fixing it: 1) Make a branch to contain the fixed code, with something like "git checkout -b betterlayout" . 2) Edit the code and commit and push your changes, with "git push origin betterlayout". This makes a corresponding branch, called "betterlayout" on the remote repo "origin". 3) Go to the repo's page on GitHub. It will probably prompt you to "submit a pull request" - if it doesn't, select the "betterlayout" branch from the "branch:" menu next to the repo name. 4) Click on the button to start a pull request. An issue-like form will appear, where you can edit the title of the pull request (eg "Better LaTeX Layout?") and provide some comment on what you have done and why. 5) Submit the pull request with the button at the bottom of the form. This will notify the repo's owner, and everyone else who is watching the repo, that you have made some changes and would like them to be merged into the code. The owner will then review your changes - notice how all the commits that have been made in the "betterlayout" branch are tracked automatically in the pull request thread.

As you can see, a pull request is a request for your changes to be pulled into another branch of the repository, typically the master branch. You often see repos with READMEs that say "pull requests welcome!" This is because they provide a mechanism for anyone to add value to your project  by making improvements and then asking you to accept them! As owner, you don't have to accept any pull request, but usually they are a Good Thing. And you always get to review them first anyway.

Notice that you can submit a pull request from any branch, including a "fork" of the repository - if you don't have push access to the base repository, just fork it, edit it, and submit a pull request from there. Just keep reading the messages closely to see what is going on.

Back to the tippety-top.


What's the difference between a "Fork" and a "Branch"?

A fork is a clone of the repository, in a different GitHub user's account. It comes with a master branch, and can have multiple additional branches just like any other repo. One key feature of a forked repo is you can push commits to it, even if you do not have push access to the base repo.  Another is that GitHub knows that the fork is connected to the base repo - and it makes it easy for you to submit a pull request from eg the master branch of the forked repo to the master branch of the base repo. 

As soon as you fork a repository, have in mind that it is continually diverging from the base repo - because even if you are not editing the code, someone else might be! To keep your forked repo up to date, you'll need to pull in changes from the base repo from time to time. Here's what you do: 1) clone your fork with "git clone yourname:thereponame.git" as usual. This makes a local copy of the repo, and attaches the name "origin" to the remote fork at GitHub. 2) Connect your local clone to the base repo, with "git remote add upstream ownersname:thereponame.git". To see which remotes you have defined, do "git remote -v" 3) Pull in updates with eg "git pull upstream master" (which merges commits made to the master branch of the owner's repository - the base repo - into your current branch). Don't forget to do a "git status" to make sure you are in the right branch before pulling! 

Back to the tippety-top.


I'm told that I have a "conflict." What should I do?

Fix it. When you try to git pull (or merge) in changes from a remote repository, and a file has been edited on the same line as the local copy you just committed, git will complain about there being a conflict, and leaves the file in a state where a) you can see both versions of the file (containing your edits, and the other ones), and b) it won't compile. It is now your job to edit the file until it is correct. Use your editor to search for the string >>>>>> - this marks the beginning of your version of the edited section. The other version starts with a ====== mark, and ends with a <<<<<<. You'll only need to edit these sections. Once you have done this (and have checked that the code is correct), you need to then tell git that the file has been corrected with git add <file>, before doing a git commit to finish off. You can then push your commits as usual.

Try not to feel hard done by: conflicts are relatively rare, and a natural consequence of collaborative coding. Sometimes you will fix conflicts, sometimes your collaborators will - it evens out in the end. You can avoid conflicts by making your commits atomic (that is, small and indivisible), pulling often, and restricting the length of your lines to 72 characters (to make it easier for git to merge line by line.

Back to the tippety-top.


I don't seem to be able to push. What should I do?

Sometimes, after trying to git push, you get an error message. You should read this carefully: most of the time its because the remote repo you are pushing to has changed, and you just need to pull, and fix any conflicts, before you push.

Note: There is a way to over-ride this error message. DO NOT USE IT. If you were to do a so-called "force-push," you would be forcing the remote version of the repository to look exactly like your local copy, including the commit history. This could include deleting files that are on the remote repo, but not pulled to your local copy, that someone else is working on. Force-push should only be used if you really know what you're doing, and are the project leader and repo admin. If you think you need to force push, open an issue and discuss it with your collaborators first.

Back to the tippety-top.


Where can I find out more?

Back to the tippety-top.