/GFA-spec

Graphical Fragment Assembly (GFA) Format Specification

Primary LanguageMakefile

GFA: Graphical Fragment Assembly (GFA) Format Specification

We are developing the specification of the Graphical Fragment Assembly (GFA) format. Your contribution is welcome. Please open up issues or submit pull requests.

Implementations

GFA 2

GFA 1

GFA 1.1

GFA 1.2

Resources

  • Examples of sequence overlap graphs (assembly graphs) in a variety of formats

GFA 2.0: Graphical Fragment Assembly (GFA2) Format Specification 2.0

Jason Chin, Richard Durbin, and myself (Gene Myers) found ourselves together at a workshop meeting in Dagstuhl Germany and hammered out an initial proposal for an assembly format. We started with GFA 1 and proceeded to build a more comprehensive design around it. After extensive revision and discussion on Github with the GFA group including Shaun Jackman, Heng Li, and Giorgio Gonnella, we arrived at GFA 2.0. The standard is an evolving effort, and your contribution is welcome. Please open up issues or submit pull requests.

The basic reason for having a standard format is that we find that in general, different development teams build assemblers, visualizers, and editors because of the complexity and distinct nature of the three tasks. While these tools should certainly use tailored encodings internally for efficiency, the nexus between the three efforts benefits from a standard encoding format that would make them all interoperable.

Fig. 1

GFA 1.0

GFA 1 was first suggested in a blog post by Heng Li (@lh3) and further developed in a second post.

GFA 1.1

W-lines were suggeseted by Heng Li (@lh3) as an extension to GFA 1 for representing haplotype information in pangenome graphs.

GFA 1.2

J-lines were suggested by Sergey Nurk and Giulio Formenti as an extension to GFA 1.1 to represent jumps in graphs such as gaps in assembly scaffolds.