
GraphBin: Refined binning of metagenomic contigs using assembly graphs

Final Labelling

GraphBin: Refined Binning of Metagenomic Contigs using Assembly Graphs

GraphBin is a NGS data-based metagenomic contig bin refinment tool that makes use of the contig connectivity information from the assembly graph to bin contigs. It utilizes the binning result of an existing binning tool and a label propagation algorithm to correct mis-binned contigs and predict the labels of contigs which are discarded due to short length.

Note: Due to recent requests from the community, we have added support for long-read assemblies produced from Flye, Canu and Miniasm. Please note that GraphBin has not been tested extensively on long-read assemblies. We originally developed GraphBin for short-read assemblies. Long-read assemblies might have sparsely connected graphs which can make the label propagation process less effective and may not result in improvements.

Downloading GraphBin

You can download the latest release of GraphBin from Releases or clone the GraphBin repository to your machine.

git clone https://github.com/Vini2/GraphBin.git

If you have downloaded a release, you will have to extract the files using the following command.

unzip [file_name].zip

Now go in to the GraphBin folder using the command

cd GraphBin/


GraphBin installation requires python 3 (tested on Python 3.6 and 3.7). You will need the following dependencies to run GraphBin and related support scripts.

Installing GraphBin

You can use Conda to setup an environemnt to run GraphBin OR you can use pip3 too install GraphBin.

Using Conda

You can use Conda to run GraphBin. You can download Anaconda or Miniconda which contains Conda.

Once you have installed Conda, make sure you are in the GraphBin folder. Now run the following commands to create a Conda environment and activate it to run GraphBin.

conda env create -f environment.yml
conda activate graphbin

Add GraphBin to your PATH variable.

echo export PATH=$PATH:$(pwd) >> ~/.bashrc   # adding path to bashrc file to be available on login
export PATH=$PATH:$(pwd)                     # enabling the PATH for current terminal session

Now you are ready to run GraphBin.

If you want to switch back to your normal environment, run the following command.

conda deactivate

Using pip3

You can install GraphBin globally or per user depending on your privileges to the system.

Installing as admin

pip3 install .

Installing for the active user

pip3 install . --user

Note for Ubuntu users

If you come across an error as Failed building wheel for python-igraph when installing GraphBin, you can install python-igraph as shown in this thread.

Now you are ready to run GraphBin.


The assembly of contigs can be done using 3 assembly software.


SPAdes is an assembler based on the de Bruijn graph approach. metaSPAdes is the dedicated metagenomic assembler of SPAdes. Use metaSPAdes (SPAdes in metagenomics mode) software to assemble reads into contigs.


SGA (String Graph Assembler) is an assembler based on the overlap-layout-consensus (more recently string graph) approach. Use SGA software to assemble reads into contigs.


MEGAHIT is an assembler based on the de Bruijn graph approach. Use MEGAHIT software to assemble reads into contigs.

Once you have obtained the assembly output, you can run GraphBin.

Using GraphBin

You can see the usage options of GraphBin by typing graphbin -h on the command line. For example,

usage: graphbin [-h] [--version] [--graph GRAPH] [--binned BINNED]
                [--output OUTPUT] [--prefix PREFIX]
                [--max_iteration MAX_ITERATION]
                [--diff_threshold DIFF_THRESHOLD] [--assembler ASSEMBLER]
                [--paths PATHS] [--contigs CONTIGS] [--delimiter DELIMITER]

GraphBin Help. GraphBin is a metagenomic contig binning tool that makes use of
the contig connectivity information from the assembly graph to bin contigs. It
utilizes the binning result of an existing binning tool and a label
propagation algorithm to correct mis-binned contigs and predict the labels of
contigs which are discarded due to short length.

optional arguments:
  -h, --help            show this help message and exit
  --graph GRAPH         path to the assembly graph file
  --binned BINNED       path to the .csv file with the initial binning output
                        from an existing tool
  --output OUTPUT       path to the output folder
  --prefix PREFIX       prefix for the output file
  --max_iteration MAX_ITERATION
                        maximum number of iterations for label propagation
                        algorithm. [default: 100]
  --diff_threshold DIFF_THRESHOLD
                        difference threshold for label propagation algorithm.
                        [default: 0.1]
  --assembler ASSEMBLER
                        name of the assembler used (SPAdes, SGA or MEGAHIT).
                        GraphBin supports Flye, Canu and Miniasm long-read
                        assemblies as well.
  --paths PATHS         path to the contigs.paths file, only needed for SPAdes
  --contigs CONTIGS     path to the contigs.fa file.
  --delimiter DELIMITER
                        delimiter for input/output results. Supports a comma
                        (,), a semicolon (;), a tab ($'\t'), a space (" ") and
                        a pipe (|) [default: , (comma)]

max_iteration and diff_threshold parameters are set by default to 100 and 0.1 respectively. However, the user can specify them when running GraphBin.

Input Format

For the SPAdes version, graphbin takes in 3 files as inputs (required).

  • Assembly graph file (in .gfa format)
  • Contigs file (in FASTA format)
  • Paths of contigs (in .paths format)
  • Binning output from an existing tool (in .csv format)

For the SGA version, graphbin takes in 2 files as inputs (required).

  • Assembly graph file (in .asqg format)
  • Contigs file (in FASTA format)
  • Binning output from an existing tool (in .csv format)

For the MEGAHIT version, graphbin takes in 3 files as inputs (required).

  • Assembly graph file (in .gfa format. To convert fastg to gfa refer here)
  • Contigs file (in FASTA format)
  • Binning output from an existing tool (in .csv format)

Note: Make sure that the initial binning result consists of contigs belonging to only one bin. GraphBin is designed to handle initial contigs which belong to only one bin. Multiple bins for the initial contigs are not supported.

Note: You can specify the delimiter for the initial binning result file and the final output file using the delimiter paramter. Enter the following values for different delimiters; , for a comma, ; for a semicolon, $'\t' for a tab, " " for a space and | for a pipe.

Note: The binning output file should have comma separated values (contig_identifier, bin_identifier) for each contig. The contents of the binning output file should look similar to the example given below. Contigs are named according to their original identifier and bin identifier.

Example metaSPAdes binned input


Example SGA binned input


Example MEGAHIT binned input


GraphBin provides a support script to generate similar files once the initial binning output folder is provided. You can refer to support/README.md for more details.

Example Usage

graphbin --assembler spades --graph /path/to/graph_file.gfa --paths /path/to/paths_file.paths --binned /path/to/binning_result.csv --output /path/to/output_folder
graphbin --assembler sga --graph /path/to/graph_file.asqg --binned /path/to/binning_result.csv --output /path/to/output_folder
graphbin --assembler megahit --graph /path/to/graph_file.gfa --contigs /path/to/contigs.fa --binned /path/to/binning_result.csv --output /path/to/output_folder

Support Scripts

GraphBin provides support scripts to format an initial binning result and visualise binning results in the assembly graph. Details about support scripts and how to execute them are provided in support/README.md file.

Test Data

The simple datasets used to test GraphBin can be found in the test_data folder. The test data for each of the datasets include the following files.

  • Contigs file
  • Assembly graph file
  • Paths file for the assembly graph (for the datasets assembled using metaSPAdes)
  • Initial binning result from MaxBin 2.0
  • Initial binning result from MetaWatt
  • Initial binning result from MetaBAT 2
  • Initial binning result from SolidBin
  • Initial binning result from BusyBee Web (Not available for metaSPAdes assemblies)
  • Ground truth labelling of contigs from TAXAassign

You can try running GraphBin using these test data files.

Visualization of the Assembly Graph of ESC+metaSPAdes Test Dataset

Initial Assembly Graph

Initial assembly graph

TAXAassign Labelling

TAXAassign Labelling

Original MaxBin Labelling with 2 Mis-binned Contigs

MaxBin Labelling

Refined Labels

Refined Labels

Final Labelling of GraphBin

Final Labelling


If you use GraphBin in your work, please cite GraphBin as,

Vijini Mallawaarachchi, Anuradha Wickramarachchi, Yu Lin. GraphBin: Refined binning of metagenomic contigs using assembly graphs. Bioinformatics, Volume 36, Issue 11, June 2020, Pages 3307–3313, DOI: 10.1093/bioinformatics/btaa180

