assembly_etc
Repo for assembly pipeline organization and scripts to support several sub-pipelines. This is based on having PacBio HiFi reads and short HiC reads.
The entire contents of the scripts folder is intended to be accessible from a PATH directory setting.
Subsets of these are shown in subfolders of the categories folder but just downloading one of those contents will not necessarily have all needed scripts. The category subfolder should have a README.md that details its use and its outputs. Though this is aspirational.
The current assembler supported is hifiasm and the preferred HiC super-scaffolder is YAHS. You'll need to install them and the CAS version of bioawk, bioawk_cas, which has extensions used in several scripts. Invoking bioawk_cas for use with fastq or fasta files is done with the shortcut script bawk and other uses of it use the shortcut script cawk.
To see the recommended assembly directory structure look at the assembly_dir_structure category.