/GVF-Variant

Perl scripts to upload database with GVF, 1000Genome and Cosmic file

Primary LanguagePerl

#GVF DB PROJECT

Requirements:

        DBIx::Class
        DBIx::Class::Schema::Loader
        tabix
  • Using DBIx::Class to add the data to the database.
  • Used DBIx::Class::Schema::Loader to add sql schema to the database. This is a mapper, which orders relationships between tables and creates classes based on sql schema.

##Command line example:

  dbicdump -o dump_directory=  -o components='["InflateColumn::DateTime"]' -o preserve_case=1 GVF::DB::Variant dbi:mysql:database=GVF_DB_Variant USER PASSWORD '{ quote_char => "`" }'
  • Used Tabix for the addition of 1000G file (chrY not included). Tabix requires sorting based on location name and genome position. Try doing this by command line alone, but tabix would not accept the file, so had to sort individually based on chromosome, then concatenate the file:

    commands used: example:

  grep '\bchr#\b' 1000G_file |sort -k2n > <file>
  cat [list of chr files] >file
  bgzip file >file.gz
  tabix -p vcf file.gz  # vcf was best option based on file type.

##Directory structure:

  bin/
                Contains all the files needed to add a GVF to the database.  
                GVFParser.pl is the main script used to add data.
                TabixToDB.pl is used by GVFParser, it finds match to 1000k genomes/cosmic files.
                GenomeValadator.pl to check that the file is hg19 build.
                CosmicToGVFParser.pl builds GVF file from Cosmic file v56
  data/
                Has various data, GVF/Cosmic, and fasta file for validator
 
  lib/
                Use and created by DBIx::Class

  mysql/
                Has MySQL schema used this project
                
  tabix/
                Contains the tabix created files.