marbl/CHM13

Bionano Saphyr BNX and CMAP

erikhuck opened this issue · 4 comments

I have a few questions about these files

  • Does the BNX file contain all the data necessary to create a truly complete Bionano optical genome mapping version of the CHM13 reference genome? Meaning will it contain telomere to telomere structural variant (SV) information (all the SVs in the genome from all the chromosomes)?
  • Does the BNX file contain data from the Y chromosome as well?
  • How was the cmap created? Was it created from the BNX file using bionano's software (bionano-solve or bionano access)? Can it be used as a reference for aligning other cmap files from samples processed using bionano's saphyr instrument?

Also I'd be happy to make a pull request with the answers to these questions included in the README for future researchers

  • The BNX is the standard sequencing run of Bionano. I'm not sure what you mean by truly complete. There are regions of the genome w/o restriction sites and so would be missing. It is complete in the sense that it was sequenced from the whole genome.
  • The Y chromosome comes from HG002 so the CHM13 optical map would not contain a Y.
  • It was run through bionano-solve but it wasn't anchored to the T2T assembly. You can use it as a reference but, as I said above, it won't be T2T and will have breaks in the assembly. It would be similar to any other human genome sequenced by Bionano. I guess it would be possible to anchor the Bionano contigs onto the CHM13 T2T sequence but why not just use the T2T sequence itself as a reference for other cmap files?

Does that answer your questions?

Thank you @skoren for your quick reply. I first thought that we had to use the BNX file to create a reference compatible with bionano data, but it turns out bionano has a tool to convert NGS sequence data into a compatible reference for optical map samples. So we would have to use the "T2T sequence itself" as you described it and convert it to a format compatible with bionano data. Is the T2T sequence (telomere to telomere for all chromosomes including Y) available in this repository?

Yes, it's available at NCBI. I just added a link to their record on the README