Java utilities for Next Generation Sequencing
Pierre Lindenbaum PhD
http://plindenbaum.blogspot.com
Important: June 2014 : I've moved the whole code from picard to htsjdk . See [[Htsjdk]].
##Tools
Tool | Description |
---|---|
SplitBam | Split a BAM by chromosome group. Creates EMPTY bams if no reads was found for a given group. |
SamJS | Filtering a SAM/BAM with javascript (rhino). |
VCFFilterJS | Filtering a VCF with javascript (rhino) |
SortVCFOnRef | Sort a VCF using the order of the chromosomes in a REFerence index. |
Illuminadir | Create a structured (**JSON** or **XML**) representation of a directory containing some Illumina FASTQs. |
BamStats04 | Coverage statistics for a BED file. It uses the Cigar string instead of the start/end to compute the coverage |
BamStats01 | Statistics about the reads in a BAM. |
VCFBed | Annotate a VCF with the content of a BED file indexed with tabix. |
VCFPolyX | Number of repeated REF bases around POS. |
VCFBigWig | Annotate a VCF with the data of a bigwig file. |
VCFTabixml | Annotate a value from a vcf+xml file.4th column of the BED indexed with TABIX is a XML string. |
GroupByGene | Group VCF data by gene/transcript. |
VCFPredictions | Basic variant prediction using UCSC knownGenes. |
FindCorruptedFiles | Reads filename from stdin and prints corrupted NGS files (VCF/BAM/FASTQ). |
VCF2XML | Transforms a VCF to XML. |
VCFAnnoBam | Annotate a VCF with the Coverage statistics of a BAM file + BED file of capture. It uses the Cigar string instead of the start/end to get the voverage |
VCFTrio | Check for mendelian incompatibilities in a VCF. |
SamGrep | Search reads in a BAM |
VCFFixIndels | Fix samtools INDELS for @SolenaLS |
NgsFilesSummary | Scan folders and generate a summary of the files (SAMPLE/BAM SAMPLE/VCF etc..). |
NoZeroVariationVCF | creates a VCF containing one fake variation if the input is empty. |
HowManyBamDict | for @abinouze : quickly find the number of distinct BAM Dictionaries from a set of BAM files. |
ExtendBed | Extends a BED file by 'X' bases. |
CmpBams | Compare two or more BAMs. |
IlluminaFastqStats | Statistics on Illumina Fastqs |
Bam2Raster | Save a BAM alignment as a PNG image. |
VcfRebase | Finds restriction sites overlapping variants in a VCF file |
FastqRevComp | Reverse complement a FATQ file for mate-pair alignment |
PicardMetricsToXML | Convert picards metrics file to XML. |
Bam2Wig | Bam to Wiggle converter |
TViewWeb | CGI/Web based version of samtools tview |
VcfRegistryWeb | CGI/Web tool printing all the variants at a given position for a collection VCF |
BlastMapAnnots | Maps uniprot/genbank annotations on a blast result. See http://www.biostars.org/p/76056 |
VcfViewGui | Simple java-Swing-based VCF viewer. |
BamViewGui | Simple java-Swing-based BAM viewer. |
Biostar81455 | Defining precisely the genomic context based on a position http://www.biostars.org/p/81455/ |
MapUniProtFeatures | map Uniprot features on reference genome. |
Biostar86363 | Set genotype of specific sample/genotype comb to unknown in multisample vcf file. |
FixVCF | Fix a VCF HEADER when I forgot to declare a FILTER or an INFO field in the HEADER |
Biostar78400 | Add the read group info to the sam file on a per lane basis |
Biostar78285 | Extract regions of genome that have 0 coverage See http://www.biostars.org/p/78285/ |
Biostar77288 | Low resolution sequence alignment visualization http://www.biostars.org/p/77288/ |
Biostar77828 | Divide the human genome among X cores, taking into account gaps See http://www.biostars.org/p/77828/ |
Biostar76892 | Fix strand of two paired reads close but on the same strand http://www.biostars.org/p/76892/ |
VCFCompareGT | VCF : compare genotypes of two or more callers for the same samples. |
SAM4WebLogo | Creates an Input file for BAM + WebLogo. |
SAM2Tsv | Tabular view of each base of the reads vs the reference. |
Biostar84786 | Table transposition |
VCF2SQL | Generate the SQL code to insert a VCF into a database |
Bam4DeseqIntervals | creates a table for DESEQ with the number of reads within a sliding window for multiple BAMS |
VCFStripAnnotations | Removes one or more field from the INFO column from a VCF. |
VCFGeneOntology | Finds the GO terms for VCF annotated with SNPEFF or VEP |
VCFFilterGO | Set the VCF FILTERs on VCF files annotated with SNPEFF or VCP testing wether a Gene belong or not to the descendants of a GO term. |
Biostar86480 | Genomic restriction finder See http://www.biostars.org/p/86480/ |
BamToFastq | Shrink your FASTQ.bz2 files by 40+% using this one weird tip by ordering them by alignment to reference |
PadEmptyFastq | Pad empty fastq sequence/qual with N/# |
SamFixCigar | Replace 'M'(match) in SAM cigar by 'X' or '=' |
FixVcfFormat | Fix PL format in VCF. Problem is described in http://gatkforums.broadinstitute.org/discussion/3453 |
VcfToRdf | Convert a VCF to RDF. |
VcfShuffle | Shuffle a VCF. |
DownSampleVcf | Down sample a VCF. |
VcfHead | Print the first variants of a VCF. |
VcfTail | Print the last variants of a VCF |
VcfCutSamples | Select/Exclude some samples from a VCF |
VcfStats | Generate some statistics from a VCF |
VcfSampleRename | Rename Samples in a VCF. |
VcffilterSequenceOntology | Filter a VCF on Seqence Ontology (SO). |
Biostar59647 | position of mismatches per read from a sam/bam file (XML) See http://www.biostars.org/p/59647/ |
VcfRenameChromosomes | Rename chromosomes in a VCF (eg. convert hg19/ucsc to grch37/ensembl) |
BamRenameChromosomes | Rename chromosomes in a BAM (eg. convert hg19/ucsc to grch37/ensembl) |
BedRenameChromosomes | Rename chromosomes in a BED (eg. convert hg19/ucsc to grch37/ensembl) |
BlastnToSnp | Map variations from a BLASTN-XML file. |
Blast2Sam | Convert a BLASTN-XML input to SAM |
VcfMapUniprot | Map uniprot features on VCF annotated with VEP or SNPEff. |
VcfCompare | Compare two VCF files. |
VcfBiomart | Annotate a VCF with the data from Biomart. |
VcfLiftOver | LiftOver a VCF file. |
BedLiftOver | LiftOver a BED file. |
VcfConcat | Concatenate VCF files. |
MergeSplittedBlast | Merge Blast hit from a splitted database |
FindMyVirus | Virus+host cell : split BAM into categories. |
Biostar90204 | linux split equivalent for BAM file . |
VcfJaspar | Finds JASPAR profiles in VCF |
GenomicJaspar | Finds JASPAR profiles in Fasta |
VcfTreePack | Create a TreeMap from one or more VCF |
BamTreePack | Create a TreeMap from one or more Bam. |
FastqRecordTreePack | Create a TreeMap from one or more Fastq files. |
WorldMapGenome | Map bed file to Genome + geographic data. |
AddLinearIndexToBed | Use a Sequence dictionary to create a linear index for a BED file. Can be used as a X-Axis for a chart. |
VCFComm | Compare mulitple VCF files, ouput a new VCF file. |
VcfIn | Prints variants that are contained/not contained into another VCF |
Biostar92368 | Binary interactions depth See also http://www.biostars.org/p/92368 |
VCFStopCodon | TODO |
FastqGrep | Finds reads in fastq files |
VcfCadd | Annotate a VCF with Combined Annotation Dependent Depletion (CADD) data. |
SortVCFOnInfo | sort a VCF using a field in the INFO column |
SamChangeReference | TODO |
SamExtractClip | TODO |
GCAndDepth | Extracts GC% and depth for multiple bam using a sliding window. |
Biostar94573 | Getting a VCF file from a CLUSTAW or FASTA alignment |
CompareBamAndBuild | Compare two BAM files mapped on two different builds. Requires a liftover chain file. |
KnownGenesToBed | Convert UCSC KnownGene to BED. |
Biostar95652 | Drawing a schematic genomic context tree. See also http://www.biostars.org/p/95652/ |
SamToPsl | Convert SAM/BAM to PSL or BED12 . |
BWAMemNOp | merge the SA:Z:* attributes of a read mapped with bwa-mem and prints a read containing a cigar string with 'N' (Skipped region from the REF). |
FastqEntropy | Compute the Entropy of a Fastq file (distribution of the length(gzipped(sequence))) |
NgsFilesScanner | Build a persistent database of NGS file. Dump as XML. |
SigFrame | GUI displaying CGH data |
Biostar103303 | Calculate Percent Spliced In (PSI) |
VCFComparePredictions | Compare the variant predictions of VCFs |
BackLocate | Map a position in a protein back to the genomic coordinates. |
FindAVariation | Search for variations in a set of VCF files. |
AlleleFrequencyCalculator | VCF: Alelle Frequency Calculator |
BuildWikipediaOntology | Build a simple RDFS/XML ontology from Wikipedia Categories. |
AlmostSortedVcf | Sort an 'almost' sorted VCF using an in-memory buffer. |
Biostar105754 | bigwig: peak distance from specific genomic BED region |
VcfRegulomeDB | Annotate a VCF with the RegulomeDB data (http://regulome.stanford.edu/) |
Biostar106668 | unmark duplicates (deprecated) |
BatchIGVPictures | GUI: Batch pictures with IGV |
PubmedDump | Dump pubmed data as XML. |
BamIndexReadNames | Build a dictionary of read names to be searched with BamQueryReadNames. |
BamQueryReadNames | Query a Bam file indexed with BamIndexReadNames. |
FastqShuffle | Shuffle Fastq files. |
FastqSplitInterleaved | Split interleaved Fastq files |
PubmedFilterJS | Filters pubmed XML using javascript. |
ReferenceToVCF | Creates a VCF containing all the possible substitutions in a Reference Genome.. |
VcfEnsemblReg | Annotate a VCF with the UCSC genome hub tracks for Ensembl Regulation. |
FastqJS | Filters a FASTQ file using javascript. |
Bam2SVG | Convert a BAM to SVG |
LiftOverToSVG | Convert UCSC LiftOver chain files to animated SVG |
VCFMerge | Combines VCF files. |
FixVcfMissingGenotypes | Use BAM to fill missing genotypes in merged VCFs |
NcbiTaxonomyToXml | Dump NCBI taxonomy tree as a hierarchical XML document |
BamCmpCoverage | Creates the figure of a comparative view of the depths sample vs sample |
FindAllCoveragesAtPosition | Find depth at specific position in a list of BAM files |
VcfMultiToOne | Convert VCF with multiple samples to a VCF with one SAMPLE |