Programming learning and data analysis resources. Please, contribute and get in touch! See MDmisc notes for other programming and genomics-related notes.
- Awesome
- Cheatsheets
- Command line
- Courses
- [Tools]
- Code best practices
- Docker
- Cloud
- Git
- Text
- SQL
- Workflows
- Web
- Miscellaneous
-
awesome-javascript-learning - A (not so) tiny list limited to the best JavaScript Learning Resources
-
awesome-cpp - A curated list of awesome C++ (or C) frameworks, libraries, resources, and shiny things.
-
awesome-go - A curated list of awesome Go frameworks, libraries and software, https://awesome-go.com/
-
awesome-kubernetes - A curated list for awesome kubernetes resources. Rendered GitBook
-
awesome-nextflow - A curated list of nextflow based pipelines, presentations, videos, tutorials.
-
awesome-pipeline - A curated list of awesome pipeline toolkits inspired by Awesome Sysadmin
-
awesome-readme - A curated list of awesome READMEs. Elements in beautiful READMEs include, but are not limited to: images, screenshots, GIFs, text formatting, etc.
-
awesome-static-generators - A curated list of static web site generators. Website
-
awesome-storage - A curated list of storage open source tools. Backups, redundancy, sharing, distribution, encryption, etc.
-
awesome-workflow-engines - A curated list of awesome open source workflow engines
-
every-programmer-should-know - A collection of (mostly) technical things every software developer should know
-
List of Data Science Cheatsheets to rule the world, PDFs covering all programming languages, machine, deep learning
-
mac-dev-setup - A beginner's guide to setting up a development environment on macOS
-
professional-programming - A collection of learning resources for curious software engineers, about coding and beyond (career development, databases, cloud, and more).
-
the-book-of-secret-knowledge - A collection of inspiring lists, manuals, cheatsheets, blogs, hacks, one-liners, cli/web tools and more.
-
Very large collection of free courses for all programming languages, interactive tutorials, podcasts and screencasts, books. And more, translated in other languages
-
useful-sed - awesome sed tips, techniques, one-liners, tutorials. By Adrian Scheff
-
awesome-reproducible-research - A curated list of reproducible research case studies, projects, tutorials, and media
-
awesome-programming-books - Awesome Programming Books
-
Best-websites-a-programmer-should-visit - Some useful websites for programmers
-
kickstartcoding/cheatsheets A selection of printable, one-page cheatsheets. HTML/CSS, Bash/Git, Python, JavaScript, Django, and more
-
quick-SQL-cheatsheet - A quick reminder of all SQL queries and examples on how to use them
-
Regular expression, Unix commands, Python quick reference, SQL reference card
-
The 50 Most Popular Linux & Terminal Commands - Full Course for Beginners - a 5 hour video course
-
Bash-Oneliner - A collection of handy Bash One-Liners and terminal tricks for data processing and Linux system maintenance. https://github.com/onceupon/Bash-Oneliner
-
awesome-bash - A curated list of delightful Bash scripts and resources
-
the-art-of-command-line - Master the command line, in one page. Advanced.
-
commandlinefu.com - Master the power of command-line with a list of one-liner gems
-
terminalsare.sexy - A curated list of Terminal frameworks, plugins & resources for command-line interface (CLI) lovers. GitHub
-
ShellCheck finds bugs in your shell scripts
-
Awesome WSL - Windows Subsystem for Linux - detailed guide for working on Linux in Windows
-
Tools for Reproducible Research - NBIS/ELIXIR course. Covers Data management, Project organisation, Git, Conda, Snakemake, Nextflow, R Markdown, Jupyter, Docker, Singularity. GitHub
-
Survival guide for Unix newbies, Settling into Unix, and Shell programming with bash tutorial, by Matt Might
-
The Unix Workbench by Sean Kross. From Bash basics to GitHub, cloud computing. GitHub
-
The Unix shell, Software Carpentry
-
Data Coding 101 – Intro To Bash. Four episodes, video
-
CSE390A - System and Software Tools (taught by Ruth E. Anderson). Well-polished lectures and homework on Unix /Git ecosystem
-
data-science-at-the-command-line - "Data Science at the Command Line" by Jeroen Janssens, GitHub
-
Command line for data science, with examples, videos
-
fd - A simple, fast and user-friendly alternative to 'find'. Ignores hidden files/folders by default.
-
ripgrep - recursively searches directories for a regex pattern while respecting your gitignore, binary files, hidden directories. Very fast.
-
readme-md-generator - CLI README generator for software packages
-
Mastering Software Development in R, book by Roger Peng
-
Tips for organizing projects, Organizing data in spreadsheets by Karl Broman
-
Mastering-GitHub-Copilot-for-Paired-Programming - A 6 Lesson course teaching everything you need to know about harnessing GitHub Copilot and an AI Paired Programing resource.
-
Software Carpentry reading material on software engineering and scientific computing
-
Software development skills for data scientists by Trey Causey
-
ProjectTemplate - an R package for advanced project management, GitHub
-
Seemann, Torsten. “Ten Recommendations for Creating Usable Bioinformatics Command Line Software.” GigaScience 2, no. 1 (December 2013)
-
List, Markus, Peter Ebert, and Felipe Albrecht. “Ten Simple Rules for Developing Usable Software in Computational Biology.” PLoS Computational Biology 13, no. 1 (January 2017)
-
Taschuk, Morgan, and Greg Wilson. “Ten Simple Rules for Making Research Software More Robust.” PLOS Computational Biology 13, no. 4 (April 13, 2017). GitHub
-
"Code and Data for the Social Sciences: A Practitioner’s Guide" book by Matthew Gentzkow and Jesse Shapiro, PDF. https://web.stanford.edu/~gentzkow/research/CodeAndData.pdf
-
Wilson, Greg, D. A. Aruliah, C. Titus Brown, Neil P. Chue Hong, Matt Davis, Richard T. Guy, Steven H. D. Haddock, et al. "Best Practices for Scientific Computing ." PLoS Biology 2014
-
Noble, William Stafford. “A Quick Guide to Organizing Computational Biology Projects.” PLoS Computational Biology 5, no. 7 (July 2009) - Computational projects organization, folder structure, command line scripts, version control
-
Awesome-docker - A curated list of Docker resources and projects. GitHub
-
A comprehensive tutorial on getting started with Docker!, by Prakhar Srivastav. GitHub
-
A Docker tutorial for reproducible research by rOpenSci Labs
-
Docker Containers on the Desktop - a collection of Docker files encapsulating various software. GitHub
-
How Docker Can Help You Become A More Effective Data Scientist, by Hamel Husain. Tweet by Jeremy Howard
-
Docker for beginners - Learn to build and deploy your distributed applications easily to the cloud with Docker, by Prakhar Srivastav. Twitter
-
Docker Jumpstart, by Andrew Odewahn
-
Docker Tutorial for Beginners - A Full DevOps Course on How to Run Applications in Containers - video course, 2 h 10 min, with interactive labs, by freeCodeCamp.org
-
Enough Docker to be Dangerous - A minimal Docker tutorial
-
Introduction to Docker - video presentation, 47 min, by Solomon Hykes
-
BioContainers - A community-driven project to create and manage bioinformatics software containers. Conda, Docker/Singularity recipes to build bioinformatics software-centered containers, specifications. mulled and involucro helper tools. Integration with BioConda, Galaxy, PhenoMeNal H2020.
Paper
Veiga Leprevost, Felipe da, Björn A Grüning, Saulo Alves Aflitos, Hannes L Röst, Julian Uszkoreit, Harald Barsnes, Marc Vaudel, et al. “BioContainers: An Open-Source and Community-Driven Framework for Software Standardization.” Edited by Alfonso Valencia. Bioinformatics 33, no. 16 (August 15, 2017): 2580–82. https://doi.org/10.1093/bioinformatics/btx192.
- Rockerverse - Docker/containerization and R. Review of packages and applications for working with R in containers. Links to packages, examples of applications (Bioconductor, Data Science), deployment of R containers on the cloud.
Paper
Nüst, Daniel, Dirk Eddelbuettel, Dom Bennett, Robrecht Cannoodt, Dav Clark, Gergely Daroczi, Mark Edmondson, et al. "The Rockerverse: Packages and Applications for Containerization with R" http://arxiv.org/abs/2001.10641 ArXiv:2001.10641 [Cs], January 28, 2020
-
Boettiger, Carl. “An Introduction to Docker for Reproducible Research.” ACM SIGOPS Operating Systems Review 49, no. 1 (January 20, 2015) - High-level Docker overview. Technical challenges, ways to address them (virtual machines). Docker concept of one pipeline - one image, dockerfiles, image versioning, using with RStudio, reusable modules, example commands, best practices
-
Boettiger, Carl, and Dirk Eddelbuettel. “An Introduction to Rocker: Docker Containers for R.” The R Journal 9, no. 2 (2017) - Rocker. Docker definitions. Command examples. Singularity. rocker-project.org
-
Container Training - lecture notes and videos of various Docker, Kubernetes presentations
-
Docker Containers and Kubernetes Fundamentals – Full Hands-On Course - video course to Learn Docker containers and Kubernetes, 6h. Website, GitHub
-
Kubernetes The Hard Way On VirtualBox - Kubernetes The Hard Way is optimized for learning, which means taking the long route to ensure you understand each task required to bootstrap a Kubernetes cluster.
-
Kubernetes for Dummies - introductory examples, by Steven McGown
-
future-kubernetes - instructions for setting up and using a Kubernetes cluster for running R in parallel using the future package.
-
Kubernetes Course - Full Beginners Tutorial - 3 hour video tutorial on Kubernetes, by Bogdan Stashchuk. GitHub
-
Using Kubernetes and the Future Package to Easily Parallelize R in the Cloud - how to use R on the Kubernetes cluster on Google Cloud. Tweet
-
awesome-cloudrun - A curated list of resources about all things Cloud Run
-
cloud-run-faq - Unofficial FAQ and everything you've been wondering about Google Cloud Run.
-
CloudBank - NSF-funded cloud computing for education, training, and allocation for cloud computing resources.
-
serverless-architecture - 'Serverless Architecture' course at Linked In Learning, by Lynn Langit
-
SkyPilot - a framework for easily running machine learning workloads on any cloud through a unified interface.
Paper
Yang, Zongheng, Zhanghao Wu, Michael Luo, Wei-Lin Chiang, Romil Bhardwaj, Woosuk Kwon, Siyuan Zhuang, et al. “SkyPilot: An Intercloud Broker for Sky Computing,” n.d.
-
The Open Science Grid - A national, distributed computing partnership for data-intensive research.
-
The Cancer Genomics Cloud (CGC) - scientific cloud computing by Seven Bridges. Contains many public datasets (TCGA, CCLE, etc.), controlled access supported. Uses AWS. Pipelines are packaged with Docker. Execution instructions are described using Common Workflow Language (CWL).
Paper
Lau, Jessica W., Erik Lehnert, Anurag Sethi, Raunaq Malhotra, Gaurav Kaushik, Zeynep Onder, Nick Groves-Kirkby, et al. “The Cancer Genomics Cloud: Collaborative, Reproducible, and Democratized—A New Paradigm in Large-Scale Computational Research.” Cancer Research 77, no. 21 (November 1, 2017): e3–6. https://doi.org/10.1158/0008-5472.CAN-17-0387.
-
batch-samples - step-by-step tutorials and code samples to learn how to use Batch on Google Cloud.
-
gcp-for-bioinformatics - Google Cloud Platform (GCP) for Bioinformatics, tutorials by Lynn Langit. Youtube playlist, more cloud courses at Lynn's profile.
-
googleComputeEngineR - An R interface to the Google Cloud Compute API, for launching virtual machines.
-
hpc-toolkit - Cloud HPC Toolkit is an open-source software offered by Google Cloud which makes it easy for customers to deploy HPC environments on Google Cloud.
-
AWS Basics for Beginners - Full Course - Video course (5h 27m) introducing topics from AWS basics to advanced cloud computing concepts. Course code and files,
-
The Open Guide to Amazon Web Services - Amazon Web Services — a practical guide
-
Amazon Web Services - Illustrated guide to Amazon EC2, UC Davis, Titus Brown, Video lecture, 2 h 42 min
-
Amazing Guide to using Amazon Web Services (AWS), illustrated step-by-step tutorial
-
- The Amazon Genomics CLI is a tool to simplify the processes of deploying the AWS infrastructure required to run genomics workflows in the cloud, to submit those workflows to run, and to monitor the logs and outputs of those workflows.
-
awesome-aws-security - Curated list of links, references, books videos, tutorials (Free or Paid), Exploit, CTFs, Hacking Practices etc. which are related to AWS Security
-
data-science-on-aws - AI and Machine Learning with Kubeflow, Amazon EKS, and SageMaker. Code for the Data Science on AWS book.
-
paws - aws is a Package for Amazon Web Services in R. Paws provides access to the full suite of AWS services from within R.
-
Video: Amazon AWS Tutorial #1, 22 min, #2, 13 min, #3, 16 min
-
Metrics - An infographics generator with 30+ plugins and 200+ options to display stats about your GitHub account and render them as SVG, Markdown, PDF or JSON!
-
New to Git and GitHub? This Essential Beginners Guide is for you
-
Resources to learn Git - Git handbook, cheatsheets, interactive tutorials
-
Git and GitHub guide, by Karl Broman
-
Git basics by GoLinuxCloud
-
Getting starting with Git - a collection of resources on working with Git and Github.
-
Happy Git and GitHub for the useR by Jenny Bryan
-
Blischak, John D., Emily R. Davenport, and Greg Wilson. “A Quick Introduction to Version Control with Git and GitHub.” Edited by Francis Ouellette. PLOS Computational Biology 12, no. 1 (January 19, 2016) - An excellent explanation of Git and GitHub. Definitions (Box 1), tutorial
-
Bryan, Jennifer. “Excuse Me, Do You Have a Moment to Talk about Version Control?”
-
Git and GitHub for Beginners - 1-hour video course by Gwen Faraday
-
GitHub Actions for R 20 min video lecture by Jim Hester, RStudio::conf 2020
-
act - Run your GitHub Actions locally
-
Curated Regular Expression Resources, with videos
-
Tutorial to sed by Bruce Barnett
-
ralger - an R package for web scraping, by M. El F. Ihaddaden. Video, 5min
-
rentrez - Access NCIB databases, including PubMed, from R
-
PubScore - Automatic calculation of literature relevance of genes
-
Adjutant - Pubmed articles analysis, word cloud, topic clustering
- Crisan, Anamaria, Tamara Munzner, and Jennifer L Gardy. “Adjutant: An R-Based Tool to Support Topic Discovery for Systematic and Literature Reviews.” Bioinformatics, August 23, 2018.
-
Where to get Twitter data for academic research, blog post by Justin Littman. Collecting, Analysing and Sharing Twitter Data, blog post by Serah Rono
-
word2vec.r - Julia's implementation of word2vec in R
-
pdftools - Text Extraction, Rendering and Converting of PDF Documents. Documentation
-
Tesseract - Open Source OCR Engine R package
-
sqlime - Online SQLite playground, for debugging and sharing SQL snippets. Web
-
sql-tutorial - SQL in 100 Queries. Website
- GenPipes - Python pipeline framework for multi-step workflows. Over 12 pipelines for RNA sequencing, chromatin immunoprecipitation sequencing, DNA sequencing, methylation sequencing, Hi-C, capture Hi-C, metagenomics, and Pacific Biosciences long-read assembly. Can be run via Docker. Creates executable scripts for PBS, SLURM, Batch, Daemon job schedulers. How to run:
<pipeline>.py -c myConfigurationFile -r myReadSetFile -s 1- X > Commands.txt && bash Commands.txt
where<pipeline>
can be any of the 12 available pipelines and X is the step number desired. Commands.txt contains the commands that the system will execute. Input: FASTQ or BAM files.Paper
Bourgey, Mathieu, Rola Dali, Robert Eveleigh, Kuang Chung Chen, Louis Letourneau, Joel Fillon, Marc Michaud, et al. “GenPipes: An Open-Source Framework for Distributed and Scalable Genomic Analyses.” GigaScience 8, no. 6 (June 1, 2019): giz037. https://doi.org/10.1093/gigascience/giz037.
-
Learn Makefiles With the tastiest examples, by Chase Lambert. GitHub
-
Why Use Make blog post by Mike Bostock
-
A minimal tutorial on make by Karl Broman
-
Learning about Makefiles by Dave Tang
-
Automation and Make by SoftwareCarpentry
-
Makefiles in bioinformatics, one PDF lecture and four exercises
-
GNU Make - A Program for Directing Recompilation book by Stallman, Richard M., and Roland McGrath. 1991
-
Snakemake workflow catalog - a ranked list of Snakemake workflows, GitHub repositories. Tweet
-
Intro to workflows for efficient automated data analysis, using snakemake, by Titus Brown, with video
-
Understanding Snakemake blog post by Vince Buffalo. GitHub repo with examples.
-
Streamlining Data-Intensive Biology With Workflow Systems - community-written review, GitHub
-
learn-wdl - Educational materials for learning WDL, by Lynn Langit
-
dg-wdl-tutorial - Deep Genomics WDL tutorial, by Greg Wilson
-
WDL101 - 1h video workshop on using WDL on Terra. Workshop material
- nf-core - community-curated guidelines for pipeline building using the Nextflow framework. Software bundled with pipelines using Conda, Docker/Singularity, Bioconda, Conda-forge, BioContainers repositories. Software bundles (yaml environment built into Docker container), continuous integration, common structure, documentation, simplicity requirements for pipelines. Extension tools: Flowcraft - A Nextflow pipeline assembler for genomics. Pipeliner - A flexible Nextflow-based framework for the definition of sequencing data processing pipelines. Similar concept - Snakemake-workflows. nf-core tools on Bioconda and PyPi. Available pipelines.
Paper
Ewels, Philip A., Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso, and Sven Nahnsen. “The Nf-Core Framework for Community-Curated Bioinformatics Pipelines.” Nature Biotechnology, February 13, 2020. https://doi.org/10.1038/s41587-020-0439-x.
-
Introduction to Bioinformatics workflows with Nextflow and nf-core - The Carmentries workshop
-
nextflow-gotchas - A collection of unexpected challenges and learnings with nextflow and nf-core.
-
nf-training-public - Nextflow training material, by SequeraLabs. Website
-
magic-of-css - A CSS course to turn you into a magician.
-
Bootstrap CSS Framework - Full Course for Beginners - Learn Bootstrap 5 in this full course for beginners. Bootstrap is the most popular CSS framework. It allows web developers to quickly design and customize responsive mobile-first sites. 2h46m video
-
Bootstrap 4 Tutorials - playlist with short videos introducing the functionality of Bootstrap. Bootstrap Documentation
-
CSS Tutorial - Zero to Hero (Complete Course) - in depth video course by freeCodeCamp.org. GitHub repository with the associated demo code
-
CSS Tutorial – Full Course for Beginners - FreeCodeCamp 11 hours video course for frontend web developers.
-
Build a website with blogdown in R presentation by Tatjana Kecojevic, GitHub
-
JavaScript for Data Science - detailed instructions by Maya Gans, Toby Hodges, and Greg Wilson
-
d3graphTheory - Interactive webapp meant to be used as graph theory tutorials. Topics include: "Vertices and Edges", "Order and Size of a Graph", "Degree of a Vertex", "Degree Sequence of a Graph", "Graphic Sequence", "Havel-Hakimi Algorithm", "Pigeonhole Principle", "Regular Graph", "Complete Graph", "Bipartite Graph", "Complete Bipartite Graph", "Walk", "Open vs Closed Walks", "Connectivity", "Eulerian Circuit", "Eulerian Trail". GitHub
-
Datavis 2020 - a free online course about how to conceptualize, design, and build interactive data visualizations with Web technologies. Videos, notes
-
Ubuntu on Windows for computational biology - How to install Ubuntu on Windows , by James Lloyd. Twitter
-
Research computing teaching materials developed by School of Medicine Research Computing at the University of Virginia - Cloud, Docker, command line, Git, Python, genomics
-
The Bioconda Team, Björn Grüning, Ryan Dale, Andreas Sjödin, Brad A. Chapman, Jillian Rowe, Christopher H. Tomkins-Tinch, Renan Valieris, and Johannes Köster. “Bioconda: Sustainable and Comprehensive Software Distribution for the Life Sciences.” Nature Methods 15, no. 7 (July 2018) - Conda, Bioconda overview
-
condacolab - Install Conda and friends on Google Colab
-
data_compression_course - A Crash Course on Data Compression.
-
Diátaxis - A systematic framework for technical documentation authoring.
-
netdata.cloud - Real-time performance monitoring, done right! Over 200 zero-configuration-supported systems, hardware, containers, and applications, from Docker/Kubernetes to databases/web servers, and more.
See R_notes and Python_notes repositories for those languages
-
CMake Cookbook - This repository collects sources for the recipes contained in the CMake Cookbook published by Packt and authored by Radovan Bast and Roberto Di Remigio
-
Java Basics – Crash Course - Learn the basics of Java programming is this crash course for beginners. 3h 36m video course
-
Java Programming for Beginners – Full Course - 4h 10m video course, from Hello World to Object Oriented Programming
-
juliatutorial - A tutorial for the Julia language inspired by the Python tutorial. Links to other resources.
-
Learn C Programming with Dr. Chuck (feat. classic book by Kernighan and Ritchie) - video course, 9h38m
-
learngo - 1000+ Hand-Crafted Go Examples, Exercises, and Quizzes
-
linguist - This library is used on GitHub.com to detect blob languages, ignore binary or vendored files, suppress generated files in diffs, and generate language breakdown graphs.
-
How_to_learn_modern_Rust - video and text resources for learning Rust
-
project-based-learning - Curated list of project-based tutorials in various programming languages, general purpose, web, many more.
-
The Rust Programming Language book, Cookin' with Rust cookbook, Why scientists are turning to Rust Nature technology feature
-
Tutorials on Topics in Julia Programming - Mastering Julia for Statistical Computing and more
-
C++ resources, part of the End-to-End Machine Learning library
-
Intermediate C/C++ programming - lecture slides by Ben Langmead
-
Rcpp for everyone - Rcpp for everyone, by Masaki E. Tsuda. GitHub
-
Swift Programming Tutorial – Full Course for Beginners - 7 hours video course