/modify-fasta

A simple script to modify a reference genome fasta file using a bed file.

Primary LanguagePython

modify-fasta

A simple script to modify a reference genome fasta file using a bed file.

dependencies

python packages

softwares

options

python3.8 modify-fasta.py -h

-h,     --help                          "Show this help message and exit"
-bed    <filename>                      "Input bed file"
-fi     <filename>                      "Input fasta file to bedtools"
-o      --output        <filename>      "The prefix of the output fasta file(s)"
-n      <INT>                           "Number of output fasta files. Must be 1 or 2 [default=1].
                                             If n=1 only one fasta file will be generated
                                                  * output.fa will be generated by using minor alleles (4th column of bed file)
                                             If n=2 two fasta file will be generated
                                                  * output.1.fa will be generated by using major alleles (3th column of bed file)
                                                  * output.2.fa will be generated by using minor alleles (4th column of bed file)

run examples

python3.8 modify-fasta.py -bed in.bed -fi in.fasta
python3.8 modify-fasta.py -bed in.bed -fi in.fasta -out outname
python3.8 modify-fasta.py -bed in.bed -fi in.fasta -out outname -n 2

in.bed is a space-delimited text file that required four fields (without header)

  • chr - The number/name of the chromosome
  • position - The ending position of the SNP.
  • allele1 - usually major allele
  • allele2 - usually minor allele
example_in.bed

1 10505 A T
1 10506 C G
1 10511 G A
1 10539 C A
1 10542 C T
1 10579 C A
1 10642 G A
1 11008 C G
1 11012 C G
1 11063 T G

Citation

Ceballos, Gürün et al. 2021, Current Biology