Prof: Dr. Thomas Parchman; SFB 209; tparchman@unr.edu
Co-instructor: Trevor Faske; tfaske@nevada.unr.edu
Workshop hours: 9:00-12:00 AM, August 6, 9 - 12
Modern biology, and other fields of science, are increasingly shaped by data sets that are orders of magnitude larger than life scientists have traditionally been trained to work with. For example, major recent advances in DNA sequencing technology have created the ability to affordably generate data spanning billions of DNA sequences in an extremely short period of time. Similar leaps in data acquisition technology are transforming other scientific disciplines as well, including but not limited to geography, communications, economics, chemistry, and physics. Spreadsheet software and graphical user interface statistical analysis packages (e.g., Excel, Statistica, JMP) are useless for the now common scale of data. The ability to manipulate, process, and analyze large data sets with basic programming and data science skills should accelerate the research productivity and success of graduate students in this era.
This short module will introduce students to basic computational tools, focusing on Unix and Python, to support data proficiency. We will start at the most basic level, assuming no or limited previous experience, and aim to build enough familiarity for students to feel enabled and motivated to learn more on their own. Our overall goal is to provide an entry point for students to gain further expertise in simple programming and efficient manipulation and analysis of large-scale data sets. While we may handle some genomic data for several programming exercises, the tools we will introduce during this workshop are not unique to genomics, and will be of value to research in any scientific field. The continued development of proficiency with these and similar skillsets are guaranteed to improve the efficacy and quality pf graduate research and productivity.
By the end of the module, students should have learned enough to feel enabled and motivated to learn more Unix and Python. Thus, this module will serve to introduce students to programming, to prepare them for more in depth courses, and most importantly to provide a foundation for ongoing independent learning.
Specific Student Learning Outcomes:
-
Students will be able to operate in a Unix computing environment, and will understand the basic use of Unix computing clusters for research.
-
Students will be able to write basic programs in Python in order to efficiently manipulate and work with large scale data.
-
Students will have enough exposure to freely available resources for Unix and Python to continue independently learning programming skills.
-
Students will be able to use basic Unix and Python skills to manipulate large data sets, and to conduct basic analyses of genome level DNA sequencing data.
-
Computer with Unix operating system Students with Mac computers already have machines running Unix and are ready to go. Students without Mac computers will have the option of checking out a Mac laptop for the semester, or will need to figure out how to install Linux or a Linux emulator on their computer.
-
Installed text editor with syntax recognition Students should have installed a text editor that will recognize syntax from code written for Unix, Python, Perl, etc. We suggest BBedit (for mac users), Visual Studio Code, or Sublime. All are free and easy to locate, download, and install.
-
Supplemental primers, readings and assignments are provided on the workshop github page.
- Practical computing for biologists Haddock, S.H.D. and Dunn, C.W., 2011. Sunderland, MA, USA: Sinauer Associates. The book is very useful for both Unix and Python.
We will meet from 9:00-12:00 each weekday morning from August 6th to August 12th. At the beginning of each session, we will introduce new concepts and material that will form the basis of the exercises, assignments, or projects we will work through during that session. We will cover questions regarding current or previous material, and then students will spend at least half of each class working on writing code independently or in small groups. Students will get the most out of each session if they review the primers and outlines of concepts ahead of time.
MAYBE As the module will be remote this session, we will hold our meetings over zoom. I will send a link to all participants each morning prior to starting.
The material for each day of the workshop will be organized in separate directories on workshop github page. Each of these directories will contain the slides that we will use to introduce material, a primer covering example Unix and Python code along with explanations, and a worksheet of programming practice exercises. There are also general directories on the repository with supplementary resources for Unix and Python, including cheat sheets, tutorials, and recommended resources for learning more.
While you can download indidvidual files from github using your preferred web browser, you can also use the UNIX command to access github as well. Using git commands can get complicated very quickly, it is a very useful skill to have for reproducibility, tracking changes, and collaboration. We do not go over git in this course but there are many tutorials online (http://swcarpentry.github.io/git-novice/).
For this course, downloading individual files might suffice. But if you would like to download the entire repo, you can do so through the command line using the below command:
hint: make a directory somewhere on your computer for this workshop. Run below command in that directory.
git clone https://github.com/tparchman/GAIN_summer2021
*Tentative Workshop Schedule. All contents are subject to change.
Date | Topic | Assignment |
---|---|---|
Aug. 6 | Intro, Unix | Unix assignments |
Aug. 9 | Python I | python1_practice_scripts.md |
Aug. 10 | Python II | python2_practice_scripts.md |
Aug. 11 | Python III | python3_practice_scripts.md |
Aug. 12 | Python IV | python4_practice_scripts.md |
--------- | --------------- | -------------------------------- |
|