The idea of this repo is to teach you a few basic concepts when using git, with a real github repository and the whole workflow.
This is a WIP! - NOT YET COMPLETE.
There might be errors in the tutorial, and some issues with the text. If you find any, please send a pull request or a patch to fix them. Thanks!
I have some ideas on how to improve this tutorial and make it even more real world, you can read those ideas in the TODO.md.
As I see it, there is no right workflow. Some of them are better suited for some teams, some are better for others. This is one of the workflows I like and it seems to me that is the one that provides great benefits for most of the cases.
But please, feel free to change it to suit your needs.
- To fork a project
- Use the command line interface to add a remote (
upstream
) repo - To work on your own local fork, while updating the
upstream
- Produce patches to be shared back
- The Pull Request
- How to merge patches and how to merge a pull request
- Working on different branches
- Merging and rebasing
First, get a github account, they're cheap (free) if you're just using them for open source projects. For this tutorial, it's ok to just use a free account.
Once you have your account ready, you need to decide a path: this tutorial will offer always two explanations: using the github client (for mac and windows, qwhen there's a difference) and only using the CLI git client, which is included by default for mac os x and shipped with the windows client for github. Bottom line: if you're under windows, and unless you really know what you're doing, for now it would be best if you just install the github windows client. It will install everything required to complete this tutorial. If you're on mac os x or linux, you can use just the command line if you feel comfortable, or a GUI client, like GitX (L).
There are some things that are either too cumbersome to do in the current GUI clients, so we will use the CLI client for some parts.
Ok, now that you're ready, it's time to fork your first project, this particular
project. So go ahead and point your browser to the repo url, sign in with
your account and fork the project to your account, using the Fork
button on
the top right. If you belong to an organization, then Github will ask you to
which account you want to fork it, for our case, just pick your username and
you're done.
Now you should have a new fork of the project, under your name. The url is the
same but instead of funkaster
it should say your username.
Git is a decentralized version control system, this means there is no real "central" repo, and every copy of the repo is usually named a "fork". Every fork is a complete clone of the original repo, with all its history. This might be a good thing or not, depending on what you expect, but most of the time it's a good thing: you can work offline, you don't need to worry about blocking other developers, or making history dirty.
This also means, if you come from a centralized VC system, you might need to change a little bit your mentality: it's a very good thing in git to commit as often as possible, and make your commits as atomically as possible. Even if this means splitting changes in one file in several commits. That is something I highly recommend and I'll show you how to do it.
Back to forks. Forks in github are actually remote clones: clones of one repo that reside in the github's servers. You then need a local clone of that clone in your computer in order to make changes. So finally, what resides in your computer is a clone of a remote clone, which in turn is also a clone from some repo. That last part is only true if you're working on a fork. If you're just working on a project of your own, then what you have is just a local clone of a remote repo. In the end, it doesn't really matter.
Now that the picture is more or less clear, you have to clone your recent fork of the main project in order to start working. If you have the github client, then it's really easy, just select the project from your list and make a local clone.
If you want to use the command line, then on the main page of your fork, click
the ssh tab next to the url of the repo (it should change to something like
git@github.com:yourusername/realworldgit.git
), then go to your terminal, cd to
some place safe (like ~/projects, or whatever...) and clone the repo:
git clone git@github.com:yourusername/realworldgit.git
If you got this error: Permission denied (publickey), look this Generating SSH Keys.
For the github client, just open the client, refresh the list of projects under the github section and click the clone button (it appears only when you hover your mouse over the project name). After that, github should place a local clone in your computer in a special place under your home directory. For windows, the default is in the Documents directory.
Whichever way you did it, you should end with a local clone, which you can start hacking into :) - Good job!
Before start hacking, we need to tell our local clone to keep an eye on the main project (the original project from which you forked). To do this, you need to go to the shell/terminal and cd into the project. One good thing about the shell installed by github, is that it will open by default on the right directory. So you should just do something like this:
NOTE: through here on, the $
character represents the prompt in your shell.
$ cd realworldgit # or wherever you placed your local clone
$ git remote add upstream git://github.com/funkaster/realworldgit.git
$ git remote -v
origin git@github.com:yourusername/realworldgit.git (fetch)
origin git@github.com:yourusername/realworldgit.git (push)
upstream git://github.com/funkaster/realworldgit.git (fetch)
upstream git://github.com/funkaster/realworldgit.git (push)
The last command shows that you have the two remotes, the first is origin
, which
represents your fork and the upstream, which represents the original clone.
The URL I used there is the url given in the main page of the project,
when you click in the Git Read-Only
tab. It's always a good idea to have
upstream as read-only unless you really know what you're doing :)
Ok, now you have the upstream ready. We will see soon how to update your local clone and how to update your remote clone (the fork).
Now we're finally ready. Your first job is to add your name to the list of git
learners. In the root of the project, there is a file named learners.txt
. Add
your name to the end of the file, following the convention in that file:
Your Name - your.email@domain.com
You can skip your email if you want, or add a comment. It's not really relevant :) After you add your name there, we can start the simplest git flow: the commit.
NOTE: if you intend to perform these steps in the github client, don't execute the commands in this section, because you will leave your repo in a clean state and you won't be able to commit from the github client unless you make another change.
Go back the your terminal/shell and make sure you're on the root of your repo.
$ git status
...
$ git add learners.txt
$ git commit -m "adding myself!"
First things first. git status
will tell you what's happening on your
clone. In this case it should show that there are unstaged changes in your
clone, in particular in the learners.txt file. If you're using the github
client, it should show GITHUB_CLIENT.
What you need to know, is that git have different states that will help you organize your code. The first one is the working directory. This state is what you see reflected in your filesystem. The next state is the staging area, this is the intermediate step where you can group things that you're about to commit. Finally, there's the git directory (or the repository), which is the actual stored data.
It works like this: you make changes in your working directory, you pick the changes you want and add them to the stating area and finally you commit your changes to the repository. If you can understand that flow, you have understood big part of how git is supposed to work.
So what we did with the last two commands is to modify the learners.txt file, add that file to the staging area (with all the changes we did) and commit that file to the repository.
Every commit needs a message. In this case, we passed the message in the
command line, with the -m
argument. If we do not pass that argument, git will
open your evironment editor (vi, emacs, nano, etc.) and ask you to write the
comment.
Keep the comments clean. There are guidelines, but again, git is very flexible. I follow this rule:
- First line max. 50 chars. Be concise, try to explain what you're doing. "fix bug" is not concise, it's dumb :)
- If I need to write something else (90% of the times you don't need to), skip a line and then write whatever you need, always staying at max 72 char width.
(Reference: http://tbaggery.com/2008/04/19/a-note-about-git-commit-messages.html)
For the commit message, if needed I try to use Markdown syntax: they look good on github.
Although we're going to do the same as before, for simple commits the workflow is highly simplified with the github client. Even if you intend to use only the github client, I recommend you read the CLI version to understand what we're doing.
NOTE: if you already did the CLI version, you might not be able to perform the following steps, because you already commited your changes and now your repo should be clean.
If you go to the github client, and click in your local clone (the blue arrow next to the repo name), github will show you that you have 1 file "to be commited". If you followed the instructions above, this means that you have 1 file that has not been staged.
If the changes are small, github client will show you the diff together with a commit pane to the right. Go ahead and write a commit message following the best practices described in the previous sections and click the commit button.
Easy, right? Sure it is :)
While tempting, if you plan to be really serious about using git, I highly recommend using the CLI as much as you can. There are several things that are not yet able to be done in the GUI. There are, however, other clients that are more powerful, but those are outside the scope of this tutorial. Having said that, I encourage you to go ahead and explore them all if you can/want, but my personal opinion is that using the cli will give you enough control and flexibility.
In my day-to-day work, I use a mix of GitX(L) and the cli. The only reason I use GitX(L) is to be able to make commits by hunks easier (more on that later), but for pushing, pulling, merging, rebasing, etc. I relly solely on the cli.
Ok, now you have your first commit ready, but if you go to your fork page, you
won't see the log for your first commit, why? As I mentioned before, git
is a
distributed VCS, and as such, changes occuring in your local clone are
independent of what is in the fork (the remote clone). But fear not, you can
synchronize them both in a safe manner: git provides the commands push
and
pull
to easily send your changes to a remote clone and to fetch the changes
from the remote repo and merge them into your changes. So let's do that.
Go back to your shell/terminal. Again, don't execute this code if you intend to follow the github client version:
$ git push origin master
What this command is doing is: push the changes that are in my repo, in the
current branch to the branch master
of the origin
repo. By default, when you
clone a repo git creates a default remote branch, the origin
. We saw that when
we executed the git remote -v
command when adding the upstream remote.
Hopefully, everything went well. You can now reload your project page in github and you should see that the last commit there is the one that you have just created. Well done!
In your github client, in the realworld repo, just click the sync
icon. This does
a few more things that the cli version. But we will skip over them for now.
Now, you want to add your name to the official repo, so other learners can see that you know how to commit and push changes to your remote clone. There are two roads: one is to do it the non-github-way and the github-way :)
First, the non-github way, as it is harder. What we are going to do, is to
compare our repo with the original one, the one that we prepared before, named
upstream
. While comparing, we will tell git to prepare a patch so we can
send it to the maintainer of that project and he can merge the changes. We
will have the opportunity to do that ourselves, because it's a part of the flow
that we need to learn, but that we'll leave that out for now.
Ok, so first get need to make sure we have an updated copy of the upstream repo.
$ git fetch upstream
What that will do is to fetch all the changes that we don't have from the upstream repo and store them locally. That process will not change anything in our repository, however, we need that information to create the patch and to update our copy of the repo.
Now the patch is prepared.
$ git format-patch upstream/master
That will extract all the commits that are in our current branch (master) but
not in the master branch in the upstream repo. In our case, we know it's only
one commit, so we will end with one new file named something like
0001-what-you-wrote-on-your-commit.patch
. We can now safely send this file to
the maintainer of the upstream repo so he can merge it (not without testing it
first!).
But this is a very simple case, most of the time you will send a patch with more than one commit. In that case you can just concatenate the patches in one file or do something more interesting, like squashing all your commits in a single commit. We will also see that technique later.
NOTE: to concatenate the patches from the shell/terminal, you can use the
cat
utility, like so:
cat *.patch > single.patch
That will take all the files with a .patch
extension and join them in the
single.patch
file. That is one way to do it, but make sure first that there
are no unwanted files with a .patch
extension before running this, otherwise
your concatenated patch might be corrupted.
The github way is really simple and is one of the facts that has made github so
popular: the pull request. Just open your browser and go to your fork, and click
the Pull Request
button. That will take you to the Pull Request (PR) page and
you will be able to select which branch from your fork will be merged in which
branch of the upstream repo. In this case is master from your fork into master
of the upstream repo.
In that interface, you can write a title for your pull request. Please try to be as descriptive as possible, without writing an essay on the title :) Usually you want to follow the same principles than in the commit message. However, the body of the PR can be very extensive if needed. The PR is, most of the time, translated in a discussion between the maintener(s) and the contributor. Allowing for code-review and checking the code quality before merging the changes.
In that same interface you can also see what files are you changing and how is actually changing. You might want to review that before clicking the send button to see that you're not issuing a PR for something that you don't want to merge.
If everything looks good, then go ahead and click the Send Pull Request
button. That should take a few moments and github should redirect you to the
original repo page, in the Pull Request
section.
Congratulations! you just made your first Pull Request!
After issueing the PR, the maintainer needs to review your code, he can comment it and if he decides to do so (and if it doesn't introduces conflicts) he can merge the PR from the github interface. This is very powerful and has allowed many Open Source projects to collaborate with different contributors from all around the world.
NOTE: if you actually sent a pull request, I'll do my best to try to merge them as they arrive, but don't hold your breath, it might take one day, it might take more than that, depending on my load.
Ok, it's time to update your fork. I'm going to assume that the maintainer has merged your pull request and there are some other changes from the original repo that you might want to fetch.
Unfortunately for this (AFAIK) the github client won't work, so we need to go back to our shell/terminal.
Here, we have two options. We can use the pull
command, which would make
something very similary to a fetch
and then a merge
, or we can execute both
commands ourselves. I will opt for the latter, just to be verbose. Please note,
that just doing fetch
will not make any changes in your repo, it will only
take whatever is new in upstream
and store it in the "git database". The
merge
command is the one that's actually modifying your repo and taking the
changes in upstream/master
, that is, the branch master
in the remote
upstream
and applying the changes on top of the HEAD of your latest change.
ED-NOTE It would be interesting at this point to add references to some internal parts of git. Reader: you might want to check the git book.
$ git fetch upstream
$ git merge upstream/master
Being able to create and switch branches with ease is one of the biggest advantages in git. Branches are something you should use all the time to keep your work clean.
Let me explain a little bit what are branches, but before that I would like to explain just briefly how git stores the data.
As opposed to other VCS, git stores its data as snapshots of the current state of your project. It's like git it's taking a picture of the state at the moment of your commit and storing that in a mini-filesystem (Git Book). Those snapshots (the commits) are identified by a hash, a 40 characters string that uniquely identifies a commit in the repo. A branch, in the git parlance can be thought of different points of view for the repo. However, in the real sense, they are actually pointers to a specific commit, that means they are actually a window to a particular point in time for your repo. But why are they called branches when they act more like pointers? Because they can actually branch (split) the changesets of your repo at some point. Let me explain this with a diagram.
NOTE we need a better diagram.
Let's assume you have your repo at some state, and you're only using one branch,
the default master
branch, then your repo looks like this.
[ master ]
↓
[ 343ffd ]
Here, 343ffd
is the imaginary last commit in your repo. Now let's assume you
create a new branch, called awesome
, which will hold a very awesome feature in
your project.
[ master ]
↓
[ 343ffd ]
↑
[ awesome ]
Now the awesome branch and the master branch point to the same commit. How does
git know which branch are you currently watching? Because there's a special
branch used by git that points to the one being used, and it's called the
HEAD
. Usually when you create a branch in git, it will move the HEAD
pointer
to that branch, so in our scenario, it would look somethinglike this.
[ master ]
↓
[ 343ffd ]
↑
[ awesome ]
↑
[ HEAD ]
If you change branches, using git checkout <branch_name>
will switch the
HEAD
pointer, and will also reflect any changes in the working directory.
Now, continuing with the assumptions, let's assume you are in the awesome branch and you make a new commit, adding the new feature. Your repo would look like this.
[ master ]
↓
[ 343ffd ] ← [ abd24a ]
↑
[ awesome ]
↑
[ HEAD ]
Both, the awesome
pointer and the HEAD
one are now shifted and pointing to
the last commit. The master
pointer, however is still on the old state.
To continue our story, you now got a patch from a contributor and need to apply
it, but you don't want to do that in the awesome
branch, because it has
nothing to do with that feature and one thing you do know is that you want to
keep things clean. So in order to apply the patch, you need to switch to the
master branch and apply it there.
$ git checkout master
$ ...
So, what does your repo looks like now?
[ master ]
↓
↙ [ 823bfd ]
[ 343ffd ]
↖ [ abd24a ]
↑
[ awesome ]
↑
[ HEAD ]
This is the split I talked about. At this point, you have master
and awesome
that point to different states, but share a common ancestor, 343ffd
. You can
extrapolate this to more branches and spinning off branches from other branches.
What are branches used for? You can actually used them for whatever you want. My
workflow usually is: work on master
and if you need to fix a bug, create a
branch, work there and then merge the branch back to master. Same with small
features.
If more than one developer are working on the same feature, they may want to work on the same branch. You can do that by publishing your branch on a remote repo and then letting the other developers about that branch. We will cover that later in the tutorial.
ED-NOTE need to remember to actually add that example.
Enough talk, this is real world git, so let's get our hands on it.
First, we're going to create a new branch.
$ git checkout -b awesome master
Switched to a new branch 'master'
The command to create a new branch, is the same command used to switch branches:
checkout. It is also used for other things, but for now, we can
live with that definition. The -b
option tells git to create a new branch, and
the optional master
argument, tells git to use that pointer as the
reference. If you omit the last argument, it will use whatever branch HEAD is
pointing to. As a bonus, git also changes HEAD to point to the newly created
branch.
$ git branch
* awesome
master
The branch
command lists the current local branches, and it shows you which
one HEAD is pointing to, by placing an asterisk ('*') before the name. In this
case, HEAD is pointing to awesome
, as we expected to.
Ok, now let's add a new "feature". Inside the branch_test
directory, create a
new file, with the following convention.
- The file should be named as your github username and have a .txt extension
- Inside the file, you can put whatever you want as long as it is less than 256 bytes. Please try to avoid profanity :)
Now that you have the file, we can stage it and commit the changes to the repo, just as we did before.
$ git add branch_test/yourusername.txt
$ git commit
(edit the commit message, save and exit the editor)
1 file changed, 1 insertion (+)
create mode 100644 branch_test/yourusername.txt
Great. Now we can run some more tests if necessary or just check that everything is ok. If applies, you could even build a binary and send it to QA. If you find out that everything is ok, then you are now ready to merge your changes into the master branch.
$ git checkout master
$ git branch
awesome
* master
As you can see, the asterisk marks the current branch (the branch to which HEAD is pointing to) and we can safely bring over the changes from awesome to master. But before doing so, you might want to make sure you have a fairly recent version of master, in order to minimize the chances of a conflict.
$ git fetch upstream
$ git merge upstream/master
Now, can we merge our awesome branch? Not yet. At this point, I would like to open a small discussion around a very powerfull feature of git, the rebase. Rebase is a very, very powerfull command in git, but also very complex (you know what they say, "with great power, comes great responsibility"). Basically what rebase allows you to do is to modify (rewrite) history from your repo. You usually want to do this because:
- You screw up, and need to fix some things.
- Because you want to keep a clean history.
Number 2 is the reason I want to introduce here, for number 1 there are tons and tons of discussions on the internet about that. According to what Linus says, what we want is clean & history. So that means for you, a very important rule:
ONLY USE REBASE ON CODE THAT HAS NOT SEEN THE LIGHT
Why? because if you publish your code (push it to your fork, for instance) and then run rebase, you will alter your history tree, and what will happen is that when you push again, the changes you will need to push will modify the version that was published, potentially producing lots of headaches. It is ok to run rebase on your local fork, as long as you are running it on un-published code, or unless you really know what you're doing.
But let's explain what does rebase do. To do that, let's go back to our diagrams for your repo.
[ master ]
↓
↙ [ 823bfd ] ← [ aba665 ]
[ 343ffd ]
↖ [ abd24a ]
↑
[ awesome ]
↑
[ HEAD ]
Let's assume now, that with the state in the diagram, you want to bring your
changes from the awesome branch to master. Once in the master branch, you would
run the git rebase awesome
command (do not run it yet!).
What rebase does, is to rewind your changes all the way until a common ancestor in the tree, apply whatever changes you have in awesome and then it applies the rest of the changes. What happens is that your tree will look like this.
[ HEAD ]
↓
[ master ]
↓
[ 343ffd ] ← [ abd24a ] ← [ 00bf32 ] ← [ 882ac3 ]
↑
[ awesome ]
Truth be told, I've simplified a lot how things work, but you get the idea :),
which is that the tree got modified: 823bfd
(now 00bf32
) was previously
pointing to 343ffd
as a parent, now is pointing to the last commit in awesome,
which is abd24a
. Since we rewrote history, commits also change their hash. You
can always read the whole (unrated & unedited) story in the
official documentation.
Now, let's try to apply that to our repo.
$ git checkout awesome
$ git log --oneline -2 # shows only the last two commits
0dff283 adding my branch-test file
c0ac58b adds branch_test directory # this will be the common ancester
$ git checkout master
$ git log --oneline -2
c0ac58b adds branch_test directory # see? right here
1c96575 minor edit in text
$ git rebase awesome
...
$ git log --oneline -2
0dff283 adding my branch-test file
c0ac58b adds branch_test directory
So what happens here, is that it since our case was a bit simpler, we didn't actually have to rewind, because we were already sitting in the point where the two branches had split. What we can see however, is that master now shares the same history than the awesome branch. At this point, we can run some more tests, build another binary for QA and if all goes well, delete the awesome branch.
$ git branch -d awesome
The command branch -d
will remove a branch, only if it was merged. This can be
considered a "safe" delete, since no change will be lost. If you really want to
delete an unmerged branch, use the -D
flag.
Now you might want to send another pull request to publish your changes, or another patch.