Python toy repository for demonstrating:
- Git
- GitHub
- Social coding
- Packaging code
- Managing virtual environments
- Type annotations
- Docstrings & API docs
- Code linting
- Unit testing & test coverage
- Continuous integration
Can be used as a template for starting new repositories that follow several good practices in terms of writing, testing, documenting and building code.
After installing the package, in your Python code or
interpreter, simply import our_package
:
import our_package
# run a function provided by the package
our_package.math.add_integers(3, 4)
Alternatively, import only the module that you are interested in:
from our_package import math as our_math
# run a function provided by the package
our_math.add_integers(3, 4)
Note that we have given the module the alias
our_math
during import, rather than just referring to it asmath
in our subsequent code; this is because we want to avoid namespace collisions with Python's built-inmath
library, which, if imported withimport math
, would also be imported asmath
.
Finally, you can also just import the function you want to use:
from our_package.math import add_integers
# run imported function
add_integers(3, 4)
Check the API docs to see all of what's in this package!
Note that the steps listed here generally need to be done only once, with two exceptions:
When starting a new shell or after calling
conda deactivate
, the Conda environment containingour_package
(as set up according to the instructions below) will always need to be activated with:conda activate programming-for-life-sciencesIf additional dependencies are added to the project along the way, they should be added to the environment so that their installation will be persistent. See the dedicated section for details on how that can be done.
While detailed, step-by-step instructions are outlined below, these summarized installation instructions may be helpful to get you started quickly!
If you haven't already installed Conda on your system, we recommend installing Miniconda. Alternatively, you can also install Anaconda which comes packed with a boatload of useful tools for bioinformatics.
git clone git@github.com:zavolanlab/programming-for-life-sciences.git
cd programming-for-life-sciences
conda deactivate
conda update -y conda
conda env create -f environment.yml
conda env update --name programming-for-life-sciences --file environment-dev.yml
conda activate programming-for-life-sciences
Before you can install the package, you need to first obtain the repository contents. Traverse to a directory of your choice, and then clone the repository with:
git clone git@github.com:zavolanlab/programming-for-life-sciences.git
This will create a directory programming-for-life-sciences
in your current
working directory. If you would prefer a different name (it's quite a
mouthful!), you can call that last command with an additional parameter
indicating the desired name, like so:
git clone git@github.com:zavolanlab/programming-for-life-sciences.git toy-repo
Enter the new directory cloned from GitHub:
cd programming-for-life-sciences
Obviously you will need to change this call accordingly, if you gave the directory a different name in the previous step.
Now that we have all the code and related files residing on our local machine
and are located in the root directory of the repository, we are ready to
install our package our_package
so that it can be imported and used in your
Python code.
However, in order to contain all software dependencies of a project in an
isolated environment, it is highly recommended to set up a virtual
environment first. There are several options to do so, such as the
Python-specific virtualenv
package. However, here we will use
[Conda]-based virtual environments, as they are somewhat more convenient to
manage across projects, and, perhaps more importantly, allow adding non-Python
dependencies.
Verify your Conda installation by running:
conda info
If it turns out that you do not have Conda installed, check the node in the quick installation section that provides links to Conda installation instructions.
To ensure you start from a clean slate, deactivate any existing environment:
conda deactivate
You may also want to ensure that you are using the latest Conda version:
conda update -y conda
Now it's time to set up your environment and install the package. You can do so either manually or by making use of [Conda environment files][conda-env-files].
To manually set up a Conda environment and install the package, start with the
following command, which instructs Conda to install a barebones environment
called programming-for-life-sciences
based on a recent Python version:
conda create --name programming-for-life-sciences python=3.8.5
Now we still need to activate the environment:
conda activate programming-for-life-sciences
Finally, we are ready to install the package. This can be done using either of the following ways:
# Install the package in an editable manner; better for development
# Will create files/directories in your current working directory
pip install -e .
# Regular installation; better when simply using a package
# Will create files/directories in your standard Python library path
pip install .
Pip is Python's default package manager, i.e., similar to an app store it knows about software/package repositories and allows you to conveniently install them, taking care of resolving and - if possible - installing dependencies. It's very similar to Conda in that sense, but while Conda has the advantage of supporting software written in any kind of language, not all Python packages are available via Conda, and, importantly, Conda cannot directly be used to install local packages such as
our_package
. However, within an active Conda environment, Conda will make sure that any packages installed viapip
will be private to this environment, ensuring, like in this case, thatour_package
will not be installed globally.
A Conda environment can be created also with a configuration file. This allows setting up the environment and installing the package (and/or any dependencies) conveniently in one go:
conda env create -f environment.yml
We will still need to activate the environment:
conda activate programming-for-life-sciences
Currently, they are no additional dependencies required for using
our_package
(but note that there are additional requirements for
testing/development). However, as time
goes by, additional dependencies are typically added to a project. Here we
will describe some ways on how you can update your Conda environment to persist
such added dependencies.
You can use Conda to add any available Conda package to your environment. If your environment is already activated, you can simply do:
conda install YOUR_PACKAGE
# Example
conda install requests=2.23.0
If your environment is not activated, you can either activate it first and the call the above command, or you can specify the name of the enviroment like so:
conda install --name programming-for-life-sciences YOUR_PACKAGE
# Example
conda install --name programming-for-life-sciences requests=2.23.0
If your package is only available via a non-default channel, you can add a channel to your call:
conda install --channel CHANNEL YOUR_PACKAGE
# Example
conda install --channel bioconda samtools=1.11
While the above process will ensure that the package will be available in
your environment, others won't know that the package is required. Therefore
you should also add any new dependecies to a version-controlled Conda
environment file, typically environment.yml
. If you do so first, this gives
you another possibility to update your environment:
conda env update --name programming-for-life-sciences --file environment.yml
This will update your environment with any new dependencies, while already available ones are ignored.
Note that if a package listed in the environment file is already installed in your environment, but the versions do not match, this call will replace the available version with the one listed in the environment file.
Using Conda to add software dependencies to your environment is generally the preferred way, even when installing Python packages. But in cases where Conda binaries are not available for a given package or because it may be more convenient/fitting to use Pip instead, you can add a package to your environment in the following way:
First, ensure that your Conda environment is activated:
conda activate programming-for-life-sciences
Then simply install your package via pip
:
pip install YOUR_PACKAGE
# Example
pip install requests==2.23.0
Similar to Conda's environment files, Pip is also able to install dependencies
from a file, which, by convention, is typically called requirements.txt
. If
you are maintaining a pure Python project and do not use Conda (or if you only
use it to manage your Python/Pip virtual environment), you can dispense with
the complexities of maintaining a Conda environment file and add your
dependencies to that file, then install/update your active Conda environment
with:
pip install -r requirements.txt
The same behavior regarding version conflicts applies as for Conda environment files: package versions listed in
requirements.txt
will replace available packages of the same name if versions do not match.
If you do not only want to use the package, but run tests and/or contribute to its development, several Python packages are required for code linting and testing.
It is good practice to keep these dependencies in version-controlled
environment/requirement files, but separate from essential dependencies.
Therefore this package provides the files environment-dev.yml
(Conda) and
requirements-dev.txt
(Pip) to store package information on any non-essential
dependencies.
Note that projects wouldn't normally include both a Conda environment and a Pip requirements file, as it is confusing and requires both files to be kept in sync. We have simply chosen to do so to make you familiar with both styles, as they are both very common. In this regard, it is also worth pointing out that while Conda is particularly great for many bioninformatics projects where dependencies are often written in different languages, building projects with Conda is typically not as well supported by automated build systems that are used, e.g., in Continuous Integration or documentation building systems. For example, in the configurations for both the Travis CI and the Sphinx documentation building engine provided in this repository, Pip is used rather than Conda for simplicitie's sake.
To install development/testing dependencies with Conda, run:
conda env update --name programming-for-life-sciences --file environment-dev.yml
To install development/testing dependencies with Pip, first ensure that the Conda environment is activate:
conda activate programming-for-life-sciences
Then run:
pip install -r requests-dev.txt
This project is a community effort and lives off your contributions, be it in the form of bug reports, feature requests, discussions, fixes, or other code changes. Please refer to our organization's contributing guidelines if you are interested to contribute. Please respect the Code of Conduct for all interactions with the community.
The project adopts the semantic versioning scheme for versioning. Currently the service is in beta stage, so the API may change without further notice.
This project is covered by the Apache License 2.0 also shipped with this repository.
Feel free to reach out to us with any questions, suggestions or complaints you may have.