Pinned Repositories
ArgoWorkflowGenerator
Java based workflow generator to deploy WES variant detection pipelines on Kubernetes using Argo. Reads WDL files and produces Argo compatible work flows specifciying compute cluster environment (pod) on which the workflow component should be executed and the buckets in which the results should be dropped
backpropagationneuralnetwork
backTranslationProteinToPutativeDNA
Gives most probable and least probable DNA Sequences for input protein sequence
BaselineSurveyResponseSimulator
Simulation of baseline survey response variables to store information in EPIC Number of variables: 495 Code: Java 8 Reads in csv file of variables and value range of permitted responses Use randomization to a) induce error b) generate valid responses, and simulate 1M responses, to use as prototype data to populate i2b2 deployments in support of the MVP project for VABHS
caBIO-load-scripts
ETL scripts to populate caBIO database
calculateCodonFrequency
perl script that reads file in multifasta format and complements it
caMODloadscripts
ETL scripts to populate caMOD database with MTB data from Jackson Labs
fastqcparser
Python code to compute adatper content in reads, kmer content, per-base-GC content (at a specific position in a read alignment, against reference genome), per base NC content (at a specific position in a read alignment against the reference genome), per base seq quality (across aligned reads), per base sequence content, per base quality scores, per tile sequence quality
GeneAbundance
rebase-using-kmp
restriction enzyme cleavage site identifier using kmp algorithm and REBASE
lvn3668's Repositories
lvn3668/fastqcparser
Python code to compute adatper content in reads, kmer content, per-base-GC content (at a specific position in a read alignment, against reference genome), per base NC content (at a specific position in a read alignment against the reference genome), per base seq quality (across aligned reads), per base sequence content, per base quality scores, per tile sequence quality
lvn3668/GeneAbundance
lvn3668/rebase-using-kmp
restriction enzyme cleavage site identifier using kmp algorithm and REBASE
lvn3668/backpropagationneuralnetwork
lvn3668/BaselineSurveyResponseSimulator
Simulation of baseline survey response variables to store information in EPIC Number of variables: 495 Code: Java 8 Reads in csv file of variables and value range of permitted responses Use randomization to a) induce error b) generate valid responses, and simulate 1M responses, to use as prototype data to populate i2b2 deployments in support of the MVP project for VABHS
lvn3668/caBIO-load-scripts
ETL scripts to populate caBIO database
lvn3668/caMODloadscripts
ETL scripts to populate caMOD database with MTB data from Jackson Labs
lvn3668/DNAExtractionModule
Code that polls beckman coulter and stores rack information, and protocols initiated (if any) on samples.
lvn3668/DNAtoProteintranslation
Converts DNA to Protein along 1, 6 (fwd or reverse strand) or user defined frames
lvn3668/findNmerfrequencies
C++ code to calculate nmer frequencies (n= 1 to 6) and write out to file
lvn3668/findPalindromesandInvertedRepeats
Finds palindromes and inverted repeats in DNA Sequences based on user defined inputs
lvn3668/gatkparser
Python package to parse GATK Output and extract summary statistics at mbq 0,10,20,30 and variant evaluation metrics
lvn3668/genemark
lvn3668/InterferenceEstimation
Java based implementation of an MLE method using chi square test to calculate interference during meiotic crossover (the number of double strand dna breaks that don't result in a crossover)
lvn3668/LaminarFlowHoodModule
Module for tracking tubes and aliquots and assign storage in the freezers ; Part of the MVP specimen processing system VABHS. Prototype
lvn3668/Microarray
Microarray data analysis using R / BioConductor
lvn3668/naivevariantcaller_ECGR_variantdetection
Python code to detect ECGR Mutations; Takes a reference genome and bunch of reads as input and finds mutations (1-3 bp length) where number of supporting reads greater than 5
lvn3668/oreillyelegantscipy
lvn3668/Phycastats
16sRNA Microbial Profiling R scripts to find most significant OTUs in 16RNA data after data normalization, followed by ordination and clustering and then plotting iTOL
lvn3668/picardparser
lvn3668/pileupnotationvariantcaller
Variant caller from pileup notation / samtools alignment
lvn3668/primerDesign
lvn3668/probeDesign
Takes as input FNA file, PTT file, desired probe length, cross-reactivity allowed, overhang
lvn3668/RShinyEntrezViewer
Application to view Entrez data (distribution of Hs / Mm genes per chromosome) using RShiny and MongoDB
lvn3668/samtoolsparsers
Parses Samtools output and extracts flagstat results such as number of reads that are pass/fail that are properly aligned, etc.
lvn3668/StatisticalAnalysisOfNetworkData
lvn3668/tensorflowtutorial
lvn3668/UMBICARB
1.Partial Scripts to process sequence clusters from 16000 microbial genomes to find orthologous protein clusters, using the most representative sequence per cluster 2. Find fold distribution across protein hits from SCOP and ASTRAL 3. Fnd most significant structural hits and perform structure alignment 4. Eliminate LGT in sequence clusters and realign phylogenetic tree for each of the pruned set of sequence clusters (pruned on basis of number of sequences, most representative seuqence not being an LGT, age in reference phylogenetic tree) 5. Correlate gaps in seuqence alignment with gaps in sequence-represenation of structure alignment to test hypothesis that indels cause fold evolution.
lvn3668/variantAnnotation
Variant annotation of vcf file using exac and vep
lvn3668/KmerCounter
Kmer counter is written in GO Lang v 1.16.5 To install GO on Windows, follow the instructions at https://golang.org/doc/install 4 GO implementation of N-mer counter in DNA sequences which tests for validity of input. It reads in file name (of fasta file) It reads the size length (kmer length) for which counts are desired and writes out to file, counts of all overlapping kmers of size 1 through the specified input. It checks if fasta file is empty amd whether kmer length is specified.