Rare Diseases Hackathon 2024 (https://www.rarediseaseaihackathon.org/)
one FASTA file for all exons (with 100 bp upstream and 100 bp downstream)
one FASTA file for all introns (with 100 bp upstream and 100 bp downstream)
Then split those multiple-FASTA into multiple FASTAs (one FASTA per exon / intron):
for exons FASTA: awk '/^>/{s="ALPL_exon"++d".fasta"} {print > s}' ALPL_exons_flank100bp.fasta
for introns FASTA: awk '/^>/{s="ALPL_intron"++d".fasta"} {print > s}' ALPL_introns_flank100bp.fasta
Note (useful) : https://github.com/stephenturner/oneliners
Create clinvar_results_trimmed.txt with columns [Name, Gene(s), Accession, GRCh38Location, Variant type]
Use parse_clinvar.ipynb
- Parse ClinVar results - filter for: Name = NM_00047, Gene(s) = ALPL, Variant type = single nucleotide variant
- Create a FASTA file per intron / exon variant.