Rare Diseases Hackathon 2024 (https://www.rarediseaseaihackathon.org/)
-
one FASTA file for all exons (with 100 bp upstream and 100 bp downstream)
-
one FASTA file for all introns (with 100 bp upstream and 100 bp downstream)
Then split those multiple-FASTA into multiple FASTAs (one FASTA per exon / intron):
for exons FASTA: awk '/^>/{s="ALPL_exon"++d".fasta"} {print > s}' ALPL_exons_flank100bp.fasta
for introns FASTA: awk '/^>/{s="ALPL_intron"++d".fasta"} {print > s}' ALPL_introns_flank100bp.fasta
Note (useful) : https://github.com/stephenturner/oneliners
https://www.ncbi.nlm.nih.gov/clinvar/?term=ALPL%5Bgene%5D&redir=gene
Create clinvar_results_trimmed.txt with columns [Name, Gene(s), Accession, GRCh38Location, Variant type]
Use parse_clinvar.ipynb
to:
- Parse ClinVar results - filter for: Name = NM_00047, Gene(s) = ALPL, Variant type = single nucleotide variant
- Create a FASTA file per intron / exon variant.