
This repository contains python and R scripts to help process genomic coordinates files (BED, GFF).

Reverse complement of the bed_to_gff script. This script takes a GFF file and converts it into a BED formatted file (BED12 or BED6). Selection on the molecular type and feature type to extract in the arguments.

usage: [-h] --gff_file GFF_FILE [--bed12] [--no_bed6]
                     [--mol_type MOL_TYPE] [--feature_type FEATURE_TYPE]
                     [--id_as_features] [--path PATH] [--name NAME]
                     [--verbose] [--discard] [--skip_exon_number]

example run :
./ -f test_files/tiny_dmel_sample_r5-57.genes.gff -b12 -v

Creates BED files from a given GFF file with specific filters. In BED12,
groups all the elements of a selected molecular type according to their
feature type.

optional arguments:
  --gff_file GFF_FILE, -f GFF_FILE
                        Name of the GFF file to be converted
  --bed12, -b12         Creates the corresponding BED12 file
  --no_bed6, -nb6       Prevents from creating the corresponding BED6 file
  --mol_type MOL_TYPE, -mt MOL_TYPE
                        The molecular type (column 3 of the GFF file) selected
                        for the BED files, default is exon
  --feature_type FEATURE_TYPE, -ft FEATURE_TYPE
                        The feature type (column 9 of the GFF file) selected
                        for the BED files, default is Parent
  --id_as_features, -id
                        Will set the ID of each element as a string containing
                        all its features
  --path PATH, -p PATH  The location where BED files will be created, default
                        is current working directory
  --name NAME, -n NAME  The name of the BED files, default is the GFF file
  --verbose, -v         Will outpout in stdout the command arguments and the
                        name of each element raising a warning in consistency
  --discard, -d         Will discard the element raising a warning in strand
                        consistency and overlapping check
  --skip_exon_number, -s
                        If set, the program will skip addiing _# for exon

Reverse complement of the gff_to_bed script. This script takes a BED file and converts it into a GFF formatted file. Works on BED12 and BED6.

usage: [-h] --bed_file BED_FILE --source SOURCE --mol_type
                     MOL_TYPE [--is_bed12] [--make_gff3]

example run :
./ -f tiny_dmel_sample_r5-57.genes.bed6 -m exon -s dmel --make_gff3
./ -f tiny_dmel_sample_r5-57.genes.bed12 --is_bed12 -m exon -s dmel --make_gff3

Creates GFF file from a given BED file. Note that the features of the GFF are
created based on the ID of the BED file.

optional arguments:
  --bed_file BED_FILE, -f BED_FILE
                        Name of the BED file to be converted
  --source SOURCE, -s SOURCE
                        Name of the source
  --mol_type MOL_TYPE, -m MOL_TYPE
                        Name of the molecular type of elements from BED
  --is_bed12            Specify this argument if bed file is bed12 formated
                        and contain blocks
  --make_gff3           Specify if you want to make the output a proper gff3


This script makes a diagnosis graph to assess the level of mutual overlap between three sets of genomic coordinates. Given three BED files, it outputs a Venn diagram and an Upset plot like the one here. Wether or not you apply the --expand argument as TRUE or FALSE, you get frequency of overlaps in terms of number of regions (--expand FALSE) or in terms of number of basepairs (--expand TRUE, by default).

Overlap Summary

Usage: /Users/flochlay/Documents/ThesisSweetThesis/These/cisReg/mapping_toolkit/summary_region_overlaps.R [options]
An R script to perfrom overlaps between 3 BED files and extract main features.

example run :
Rscript --vanilla summary_region_overlaps.R --bed1 test_files/sample1.bed --bed2 test_files/sample2.bed --bed3 test_files/sample3.bed --out test_files/overlap_summary.pdf --name1 layer1 --name2 layer2 --name3 layer3 --expand TRUE

    First BED file

    Second BED file

    Third BED file

    Name of 1st overlap

    Name of 2nd overlap

    Name of 3rd overlap

    path and name of the output PDF

    If set as FALSE, overlap frequencies are based on region counts rather than base pair counts (usefull for large dataset)

