/BIOCAD-BWA

The second part of https://github.com/npanuhin/BIOCAD

Primary LanguagePythonMIT LicenseMIT

BWA and SAM analysis

This repository is no longer maintained. Please refer to npanuhin/BIOCAD for a continuation of this project.

This repository is not intended to represent work, but rather to store and transmit data.

Analysis status

✅ - Works as intended ⚠ - There are problems, but the solution is possible ❌ - There are problems that make the solution wrong

  • ✅ large01
  • ✅ large02
  • ✅ large03
  • ✅ large04
  • ✅ large05
  • ✅ large06
  • ✅ large07
  • ⚠ large08
  • ✅ large09
  • ⚠ large10
  • ✅ large11
  • ❌ large12
  • ✅ small (BWA⚠)

Additional scripts

This repository also includes implementations of various algorithms written in C++ such as Burrows–Wheeler transform, Knuth–Morris–Pratt algorithm and k-mers compression.

How BWA works now

  1. BWA indexes two fasta sequences
  2. BWA aligns these two sequences
  3. samtools converts sam file to bam file (currently disabled)
  4. samtools sorts bam file (currently disabled)
  5. sam2pairwise converts sam file to pairwise (txt file) (currently disabled)

For SAM and pairwise files word wrap should be disabled

Download software:

Or run sudo apt install bwa samtools

Contents

  1. large01/large_genome1.fasta: Rickettsia rickettsii str. Brazil, complete genome
    large01/large_genome2.fasta: Rickettsia rickettsii str. Iowa, complete genome

  1. large02/large_genome1.fasta: Brucella abortus 104M chromosome 1, complete sequence
    large02/large_genome2.fasta: Brucella suis bv. 2 strain Bs143CITA chromosome I, complete sequence

  1. large03/large_genome1.fasta: Brucella abortus 104M chromosome 2, complete sequence
    large03/large_genome2.fasta: Brucella suis bv. 2 strain Bs143CITA chromosome II, complete sequence

  1. large04/large_genome1.fasta: Brucella pinnipedialis B2/94 chromosome 2, complete sequence
    large04/large_genome2.fasta: Brucella melitensis biovar Abortus 2308 chromosome II, complete sequence, strain 2308

  1. large05/large_genome1.fasta: Rickettsia rickettsii str. Iowa, complete sequence
    large05/large_genome2.fasta: Rickettsia prowazekii str. Madrid E, complete genome

  1. large06/large_genome1.fasta: Methanococcus maripaludis C5, complete genome
    large06/large_genome2.fasta: Methanococcus maripaludis X1, complete genome

  1. large07/large_genome1.fasta: Mycobacterium tuberculosis variant africanum GM041182, complete genome
    large07/large_genome2.fasta: Mycobacterium intracellulare ATCC 13950, complete sequence

  1. large08/large_genome1.fasta: Desulfurococcus kamchatkensis 1221n, complete genome
    large08/large_genome2.fasta: Desulfurococcus fermentans DSM 16532, complete genome

  1. large09/large_genome1.fasta: Sulfolobus islandicus M.16.27, complete genome
    large09/large_genome2.fasta: Sulfolobus islandicus REY15A, complete genome

  1. large10/large_genome1.fasta: Rickettsia canadensis str. CA410, complete genome
    large10/large_genome2.fasta: Rickettsia conorii str. Malish 7, complete sequence

  1. large11/large_genome1.fasta: Rickettsia canadensis str. CA410, complete genome
    large11/large_genome2.fasta: Rickettsia sibirica 246 chromosome, whole genome shotgun sequence

  1. large12/large_genome1.fasta: Rickettsia argasii T170-B grat170.contig.0_1, whole genome shotgun sequence
    large12/large_genome2.fasta: Rickettsia endosymbiont of Ixodes pacificus strain Humboldt reip.contig.0_1, whole genome shotgun sequence