
Hifiasm: a haplotype-resolved assembler for accurate Hifi reads

Primary LanguageC++MIT LicenseMIT

Getting Started

# Install hifiasm (requiring g++ and zlib)
git clone https://github.com/chhylp123/hifiasm
cd hifiasm && make

# Run on test data (use -f0 for small datasets)
wget https://github.com/chhylp123/hifiasm/releases/download/v0.7/chr11-2M.fa.gz
./hifiasm -o test -t4 -f0 chr11-2M.fa.gz 2> test.log
awk '/^S/{print ">"$2;print $3}' test.p_ctg.gfa > test.p_ctg.fa  # get primary contigs in FASTA

# Assemble inbred/homozygous genomes (-l0 disables duplication purging)
hifiasm -o CHM13.asm -t32 -l0 CHM13-HiFi.fa.gz 2> CHM13.asm.log
# Assemble heterozygous with built-in duplication purging
hifiasm -o HG002.asm -t32 HG002-file1.fq.gz HG002-file2.fq.gz

# Trio binning assembly (requiring https://github.com/lh3/yak)
yak count -b37 -t16 -o pat.yak <(cat pat_1.fq.gz pat_2.fq.gz) <(cat pat_1.fq.gz pat_2.fq.gz)
yak count -b37 -t16 -o mat.yak <(cat mat_1.fq.gz mat_2.fq.gz) <(cat mat_1.fq.gz mat_2.fq.gz)
hifiasm -o HG002.asm -t32 -1 pat.yak -2 mat.yak HG002-HiFi.fa.gz


Hifiasm is a fast haplotype-resolved de novo assembler for PacBio Hifi reads. It can assemble a human genome in several hours and works with the California redwood genome, one of the most complex genomes sequenced so far. Hifiasm can produce primary/alternate assemblies of quality competitive with the best assemblers. It also introduces a new graph binning algorithm and achieves the best haplotype-resolved assembly given trio data.


A typical hifiasm command line looks like:

hifiasm -o NA12878.asm -t 32 NA12878.fq.gz

where NA12878.fq.gz provides the input reads, -t sets the number of CPUs in use and -o specifies the prefix of output files. For this example, the primary contigs are written to NA12878.asm.p_ctg.gfa and alternate contigs to NA12878.asm.a_ctg.gfa. At the first run, hifiasm saves corrected reads and overlaps to disk as NA12878.asm.*.bin. It reuses the saved results to avoid the time-consuming all-vs-all overlap calculation next time. You may specify -i to ignore precomputed overlaps and redo overlapping from raw reads.

Hifiasm purges haplotig duplications by default. For inbred or homozygous genomes, you may disable purging with option -l0. Old HiFi reads may contain short adapter sequences at the ends of reads. You can specify -z20 to trim both ends of reads by 20bp. For small genomes, use -f0 to disable the initial bloom filter which takes 16GB memory at the beginning. For genomes much larger than human, applying -f38 or even -f39 is preferred to save memory on k-mer counting.

When parental short reads are available, hifiasm can generate a pair of haplotype-resolved assemblies with trio binning. To perform such assembly, you need to count k-mers first with yak first and then do assembly:

yak count -k31 -b37 -t16 -o pat.yak paternal.fq.gz
yak count -k31 -b37 -t16 -o mat.yak maternal.fq.gz
hifiasm -o NA12878.asm -t 32 -1 pat.yak -2 mat.yak NA12878.fq.gz

Here NA12878.asm.hap1.p_ctg.gfa and NA12878.asm.hap2.p_ctg.gfa give the two haplotype assemblies. In the binning mode, hifiasm does not purge haplotig duplications by default. Because hifiasm reuses saved overlaps, you can generate both primary/alternate assemblies and trio binning assemblies with

hifiasm -o NA12878.asm -t 32 NA12878.fq.gz 2> NA12878.asm.pri.log
hifiasm -o NA12878.asm -t 32 -1 pat.yak -2 mat.yak /dev/null 2> NA12878.asm.trio.log

The second command line will run much faster than the first. You can also dump error corrected in FASTA and/or overlaps in PAF with

hifiasm -o NA12878.asm -t 32 --write-paf --write-ec /dev/null

Output files

For non-trio assembly, hifiasm generates the following files:

  1. Haplotype-resolved raw unitig graph in GFA format (prefix.r_utg.gfa). This graph keeps all haplotype information, including somatic mutations and recurrent sequencing errors.
  2. Haplotype-resolved processed unitig graph without small bubbles (prefix.p_utg.gfa). Small bubbles might be caused by somatic mutations or noise in data, which are not the real haplotype information.
  3. Primary assembly contig graph (prefix.p_ctg.gfa). This graph collapses different haplotypes.
  4. Alternate assembly contig graph (prefix.a_ctg.gfa). This graph consists of all assemblies that are discarded in primary contig graph.

For trio assembly, hifiasm generates the following files:

  1. Haplotype-resolved raw unitig graph in GFA format (prefix.r_utg.gfa). This graph keeps all haplotype information.

  2. Phased paternal/haplotype1 contig graph (prefix.hap1.p_ctg.gfa). This graph keeps the phased paternal/haplotype1 assembly.

  3. Phased maternal/haplotype2 contig graph (prefix.hap2.p_ctg.gfa). This graph keeps the phased maternal/haplotype2 assembly.

Hifiasm writes error corrected reads to the prefix.ec.bin binary file and writes overlaps to prefix.ovlp.source.bin and prefix.ovlp.reverse.bin.


The following table shows the statistics of several hifiasm primary assemblies:

Dataset Size Cov. Asm options CPU time Wall time RAM N50
Mouse (C57/BL6J) 2.6Gb ×25 -t48 -l0 172.9h 4.8h 76G 21.1Mb
Maize (B73) 2.2Gb ×22 -t48 -l0 203.2h 5.1h 68G 36.7Mb
Strawberry 0.8Gb ×36 -t48 -D10 152.7h 3.7h 91G 17.8Mb
Frog 9.5Gb ×29 -t48 2834.3h 69.0h 463G 9.3Mb
Redwood 35.6Gb ×28 -t80 3890.3h 65.5h 699G 5.4Mb
Human (CHM13) 3.1Gb ×32 -t48 -l0 310.7h 8.2h 114G 88.9Mb
Human (HG00733) 3.1Gb ×33 -t48 269.1h 6.9h 135G 69.9Mb
Human (HG002) 3.1Gb ×36 -t48 305.4h 7.7h 137G 98.7Mb

Hifiasm can assemble a 3.1Gb human genome in several hours or a ~30Gb hexaploid redwood genome in a few days on a single machine. For trio binning assembly:

Dataset Cov. CPU time Elapsed time RAM N50
HG00733, [father], [mother] ×33 269.1h 6.9h 135G 35.1Mb (paternal), 34.9Mb (maternal)
HG002, [father], [mother] ×36 305.4h 7.7h 137G 41.0Mb (paternal), 40.8Mb (maternal)
NA12878, [father], [mother] ×30 180.8h 4.9h 123G 27.7Mb (paternal), 27.0Mb (maternal)

Except NA12878, the assemblies above were produced by hifiasm v0.7 and can be downloaded at


NA12878 was assembled with a more recent version of hifiasm and is available at


Getting Help

For detailed description of options, please see man ./hifiasm.1. The -h option of hifiasm also provides brief description of options. If you have further questions, please raise an issue at the issue page.


  1. Purging haplotig duplications may introduce misassemblies.