If you are joining or collaborating with the lab, this is a good place to start to look at our operating procedures with Git and GitHub.
GitHub and it's associated tools are an excellent way to share code (but not data), collaboratively work on programming-heavy projects, and track your own progress as you work on a project over (potentially) several years. The overarching goal of using these services is to make our code more useful, reproducible, and open. Using Git and GitHub are also becoming the de facto way to publish code once you publish a paper with any level of programming as a key part, from simple data munging to modelling efforts. Finally, using GitHub is an excellent way to track your progress as a programmer in a way that can be extremely useful when applying to future programming oriented jobs.
For the lab our goals will vary depending on the use case. Below are some examples from simplest use-case to the more complex interactive ways of using GitHub. None of these uses are exclusive and likely you will be using GitHub for a combination of these uses.
Simplest use cases
-
Sharing code with the lab. After you have finished a project, dissertation, or paper, post your code onto GitHub as a permanent repository for your work. This can be code entirely complete, with no anticipated changes, but it should be fully interpretable by someone else in the lab. Ideally this use case only involves commenting your code well and adding in a readme file, with explicit links or instructions for accessing data necessary for the project.
-
Publishing code. Ideally when you have finished a project where programming is an essential part of your results (most projects will be this way), you publish the code. GitHub is an excellent place to publish code, but the standards for publishing your code publically are naturally a higher standard than sharing the code only within our lab. GitHub has it's own guide for publishing citeable code here. And there is increasing attention to code as a product that needs to be cited like in this paper about code in Trends in ecology and evolution. There are a lot more papers and guidelines for publishing code in hydrology, ecology, and affiliated fields. You should check with your journal and look to your field in general for guidelines.
-
Version control. Like verison control in microsoft word, or with dropbox, Git and GitHub are excellent version control tools. These tools allow you to track your code as it changes and help you diagnose problems as you add new lines of analysis. Using version control from the start of a project is highly encouraged, but you can add it to a project at any time. A good guide for this approach is here
-
Collaborative coding. The version control component of Git can be integrated into a team workflow that enables much more collaborative and simultaneous project development. Using GitHub as a team workspace can be intimidating because of all the extra layers required to properly manage team projects, but it can also be rewarding and save a lot of headaches down the road. Using GitHub in a team really only requires two people and the desire to learn and there are lots of resources on the web that can help you get started. Starting here
Most complex use case
This document and repository will not be an extensive guide to using GitHub. However, the three primary tools you will need to learn to successfully work with GitHub are listed below along with some links to tutorials.
-
Git. Git is the underlying version control program that GitHub works with. Before any of your code can be pushed up to your GitHub repository, it must first be committed to a local repository. A project where you use Git and GitHub will therefore have both local version control storage and cloud (GitHub) storage. Git is relatively simple and there are tons of tutorials online. For R users, Dr. Jenny Bryan's webpage is incredibly helpful.
-
GitHub. Github is the web repository for your code and so much more. The best way to learn how to use Github is to follow their guides shown here.
-
Markdown. Markdown is a web markup language that github interprets live to make pretty templates. This document is written in markdown and there are lots of guides for using markdown starting here. Bonus! If you program in R and use RMarkdown then your code can be integrated with markdown text in a way that makes reading through entire coding projects much easier.