/jvarkit

Java utilities for Bioinformatics

Primary LanguageJavaOtherNOASSERTION

JVARKIT

Java utilities for Next Generation Sequencing

Pierre Lindenbaum PhD

http://plindenbaum.blogspot.com

@yokofakun

Announce

Important: June 2014 : I've moved the whole code from picard to htsjdk . See [[Htsjdk]].

Download and install

see Download and Install

##Tools

ToolDescription
SplitBamSplit a BAM by chromosome group. Creates EMPTY bams if no reads was found for a given group.
SamJSFiltering a SAM/BAM with javascript (rhino).
VCFFilterJSFiltering a VCF with javascript (rhino)
SortVCFOnRefSort a VCF using the order of the chromosomes in a REFerence index.
IlluminadirCreate a structured (**JSON** or **XML**) representation of a directory containing some Illumina FASTQs.
BamStats04Coverage statistics for a BED file. It uses the Cigar string instead of the start/end to compute the coverage
BamStats01Statistics about the reads in a BAM.
VCFBedAnnotate a VCF with the content of a BED file indexed with tabix.
VCFPolyXNumber of repeated REF bases around POS.
VCFBigWigAnnotate a VCF with the data of a bigwig file.
VCFTabixmlAnnotate a value from a vcf+xml file.4th column of the BED indexed with TABIX is a XML string.
GroupByGeneGroup VCF data by gene/transcript.
VCFPredictionsBasic variant prediction using UCSC knownGenes.
FindCorruptedFilesReads filename from stdin and prints corrupted NGS files (VCF/BAM/FASTQ).
VCF2XMLTransforms a VCF to XML.
VCFAnnoBamAnnotate a VCF with the Coverage statistics of a BAM file + BED file of capture. It uses the Cigar string instead of the start/end to get the voverage
VCFTrioCheck for mendelian incompatibilities in a VCF.
SamGrepSearch reads in a BAM
VCFFixIndelsFix samtools INDELS for @SolenaLS
NgsFilesSummaryScan folders and generate a summary of the files (SAMPLE/BAM SAMPLE/VCF etc..).
NoZeroVariationVCFcreates a VCF containing one fake variation if the input is empty.
HowManyBamDictfor @abinouze : quickly find the number of distinct BAM Dictionaries from a set of BAM files.
ExtendBedExtends a BED file by 'X' bases.
CmpBamsCompare two or more BAMs.
IlluminaFastqStatsStatistics on Illumina Fastqs
Bam2RasterSave a BAM alignment as a PNG image.
VcfRebaseFinds restriction sites overlapping variants in a VCF file
FastqRevCompReverse complement a FATQ file for mate-pair alignment
PicardMetricsToXMLConvert picards metrics file to XML.
Bam2WigBam to Wiggle converter
TViewWebCGI/Web based version of samtools tview
VcfRegistryWebCGI/Web tool printing all the variants at a given position for a collection VCF
BlastMapAnnotsMaps uniprot/genbank annotations on a blast result. See http://www.biostars.org/p/76056
VcfViewGuiSimple java-Swing-based VCF viewer.
BamViewGuiSimple java-Swing-based BAM viewer.
Biostar81455Defining precisely the genomic context based on a position http://www.biostars.org/p/81455/
MapUniProtFeaturesmap Uniprot features on reference genome.
Biostar86363Set genotype of specific sample/genotype comb to unknown in multisample vcf file.
FixVCFFix a VCF HEADER when I forgot to declare a FILTER or an INFO field in the HEADER
Biostar78400Add the read group info to the sam file on a per lane basis
Biostar78285Extract regions of genome that have 0 coverage See http://www.biostars.org/p/78285/
Biostar77288Low resolution sequence alignment visualization http://www.biostars.org/p/77288/
Biostar77828Divide the human genome among X cores, taking into account gaps See http://www.biostars.org/p/77828/
Biostar76892Fix strand of two paired reads close but on the same strand http://www.biostars.org/p/76892/
VCFCompareGTVCF : compare genotypes of two or more callers for the same samples.
SAM4WebLogoCreates an Input file for BAM + WebLogo.
SAM2TsvTabular view of each base of the reads vs the reference.
Biostar84786Table transposition
VCF2SQLGenerate the SQL code to insert a VCF into a database
Bam4DeseqIntervalscreates a table for DESEQ with the number of reads within a sliding window for multiple BAMS
VCFStripAnnotationsRemoves one or more field from the INFO column from a VCF.
VCFGeneOntologyFinds the GO terms for VCF annotated with SNPEFF or VEP
VCFFilterGOSet the VCF FILTERs on VCF files annotated with SNPEFF or VCP testing wether a Gene belong or not to the descendants of a GO term.
Biostar86480Genomic restriction finder See http://www.biostars.org/p/86480/
BamToFastqShrink your FASTQ.bz2 files by 40+% using this one weird tip by ordering them by alignment to reference
PadEmptyFastqPad empty fastq sequence/qual with N/#
SamFixCigarReplace 'M'(match) in SAM cigar by 'X' or '='
FixVcfFormatFix PL format in VCF. Problem is described in http://gatkforums.broadinstitute.org/discussion/3453
VcfToRdfConvert a VCF to RDF.
VcfShuffleShuffle a VCF.
DownSampleVcfDown sample a VCF.
VcfHeadPrint the first variants of a VCF.
VcfTailPrint the last variants of a VCF
VcfCutSamplesSelect/Exclude some samples from a VCF
VcfStatsGenerate some statistics from a VCF
VcfSampleRenameRename Samples in a VCF.
VcffilterSequenceOntologyFilter a VCF on Seqence Ontology (SO).
Biostar59647position of mismatches per read from a sam/bam file (XML) See http://www.biostars.org/p/59647/
VcfRenameChromosomesRename chromosomes in a VCF (eg. convert hg19/ucsc to grch37/ensembl)
BamRenameChromosomesRename chromosomes in a BAM (eg. convert hg19/ucsc to grch37/ensembl)
BedRenameChromosomesRename chromosomes in a BED (eg. convert hg19/ucsc to grch37/ensembl)
BlastnToSnpMap variations from a BLASTN-XML file.
Blast2SamConvert a BLASTN-XML input to SAM
VcfMapUniprotMap uniprot features on VCF annotated with VEP or SNPEff.
VcfCompareCompare two VCF files.
VcfBiomartAnnotate a VCF with the data from Biomart.
VcfLiftOverLiftOver a VCF file.
BedLiftOverLiftOver a BED file.
VcfConcatConcatenate VCF files.
MergeSplittedBlastMerge Blast hit from a splitted database
FindMyVirusVirus+host cell : split BAM into categories.
Biostar90204linux split equivalent for BAM file .
VcfJasparFinds JASPAR profiles in VCF
GenomicJasparFinds JASPAR profiles in Fasta
VcfTreePackCreate a TreeMap from one or more VCF
BamTreePackCreate a TreeMap from one or more Bam.
FastqRecordTreePackCreate a TreeMap from one or more Fastq files.
WorldMapGenomeMap bed file to Genome + geographic data.
AddLinearIndexToBedUse a Sequence dictionary to create a linear index for a BED file. Can be used as a X-Axis for a chart.
VCFCommCompare mulitple VCF files, ouput a new VCF file.
VcfInPrints variants that are contained/not contained into another VCF
Biostar92368Binary interactions depth See also http://www.biostars.org/p/92368
VCFStopCodonTODO
FastqGrepFinds reads in fastq files
VcfCaddAnnotate a VCF with Combined Annotation Dependent Depletion (CADD) data.
SortVCFOnInfosort a VCF using a field in the INFO column
SamChangeReferenceTODO
SamExtractClipTODO
GCAndDepthExtracts GC% and depth for multiple bam using a sliding window.
Biostar94573Getting a VCF file from a CLUSTAW or FASTA alignment
CompareBamAndBuildCompare two BAM files mapped on two different builds. Requires a liftover chain file.
KnownGenesToBedConvert UCSC KnownGene to BED.
Biostar95652Drawing a schematic genomic context tree. See also http://www.biostars.org/p/95652/
SamToPslConvert SAM/BAM to PSL or BED12 .
BWAMemNOpmerge the SA:Z:* attributes of a read mapped with bwa-mem and prints a read containing a cigar string with 'N' (Skipped region from the REF).
FastqEntropyCompute the Entropy of a Fastq file (distribution of the length(gzipped(sequence)))
NgsFilesScannerBuild a persistent database of NGS file. Dump as XML.
SigFrameGUI displaying CGH data
Biostar103303Calculate Percent Spliced In (PSI)
VCFComparePredictionsCompare the variant predictions of VCFs
BackLocateMap a position in a protein back to the genomic coordinates.
FindAVariationSearch for variations in a set of VCF files.
AlleleFrequencyCalculatorVCF: Alelle Frequency Calculator
BuildWikipediaOntologyBuild a simple RDFS/XML ontology from Wikipedia Categories.
AlmostSortedVcfSort an 'almost' sorted VCF using an in-memory buffer.
Biostar105754bigwig: peak distance from specific genomic BED region
VcfRegulomeDBAnnotate a VCF with the RegulomeDB data (http://regulome.stanford.edu/)
Biostar106668unmark duplicates (deprecated)
BatchIGVPicturesGUI: Batch pictures with IGV
PubmedDumpDump pubmed data as XML.
BamIndexReadNamesBuild a dictionary of read names to be searched with BamQueryReadNames.
BamQueryReadNamesQuery a Bam file indexed with BamIndexReadNames.
FastqShuffleShuffle Fastq files.
FastqSplitInterleavedSplit interleaved Fastq files
PubmedFilterJSFilters pubmed XML using javascript.
ReferenceToVCFCreates a VCF containing all the possible substitutions in a Reference Genome..
VcfEnsemblRegAnnotate a VCF with the UCSC genome hub tracks for Ensembl Regulation.
FastqJSFilters a FASTQ file using javascript.
Bam2SVGConvert a BAM to SVG
LiftOverToSVGConvert UCSC LiftOver chain files to animated SVG
VCFMergeCombines VCF files.
FixVcfMissingGenotypesUse BAM to fill missing genotypes in merged VCFs
NcbiTaxonomyToXml Dump NCBI taxonomy tree as a hierarchical XML document
BamCmpCoverage Creates the figure of a comparative view of the depths sample vs sample
FindAllCoveragesAtPositionFind depth at specific position in a list of BAM files
VcfMultiToOneConvert VCF with multiple samples to a VCF with one SAMPLE