/lore

LOng-read Repeat Element pipeline

Primary LanguagePython

LoRE logo

Long-read Repeat Element pipeline

for PacBio single-cell MAS-seq data

LoRE workflow

Expand minimal workflow

broken image

Expand maximal workflow

broken image

Underlying MAS-Seq workflow overview

Expand

broken image

How install LoRE:

Clone the repository:

git clone https://github.com/siebrenf/lore.git

Create the conda environment:

conda env create -n lore -f lore/requirements.yaml
conda activate lore

Install LoRE in the conda environment:

pip install -e ./lore

How to run LoRE:

Change directory into the LoRE folder.

Activate the conda environment:

conda activate lore

Update the config.yaml.

  • Adapters, primers and barcodes for the 5' Kinnex kit can be downloaded by LoRE, or can be placed inside the results directory.
  • The results directory (as well as any other directory) can be set in the config. The default results directory is ./results.
  • The genome and gene annotation need to be obtained manually. You will need to specify their locations in the config, as well as the symbol for the mitochondria.
    • If the pigeon classify output suggests the number of reads per cell is low, this may suggest the genome and/or gene annotation were insufficiently annotated.
  • Additional documentation for most rules (steps) in the workflow can be found in the code.
  • Optional outputs (currently) include bigwigs (for track visualization) and a QC report. Both are recommended, but adds (some) computational load.

Test your config:

snakemake --snakefile lore/Snakefile --configfile config.yaml --dry-run

Run your config:

nice snakemake --use-conda --snakefile lore/Snakefile --configfile config.yaml --resources parallel_downloads=1 mem_mb=100_000 -j 60 > log.txt 2>&1

Further reading:

TODO:

  • implement TE/RE detection using the output of either:
    • isoseq_groupdedup (an unaligned FASTA and BAM file)
    • pbmm2_align (an aligned BAM file)
      • current settings:
        • multimapped reads are included: reads are assigned to any number of locations (nice for TEs).
        • unmapped reads are included.
    • repositories of interest to this purpose have been marked below.
  • learn more about the pigeon classify filter settings (for the gene level, some filters may be more lenient).
  • integrate genomepy to get a genome & gene annotation.
    • figure out the requirements for a "good" reference genome & gene annotation.

Repositories of interest for TE/RE detection: