results fileter

Question

results fileter

whiteorchid opened this issue 4 years ago · 5 comments

Hi,

May I know how to filter the results, as some item has few overlaps(eg. < 100), is there a threshold(eg. >20 bp) for the result to be considered as a truly detected ERV.

Thanks so much for your guidance!

Answer 1 · 2022-02-12T16:21:27.000Z

Hi @whiteorchid I have a problem with running ERV, could you help me to check this issue?
#7
Best,
Jian-Guo

Answer 2 · 2022-02-15T23:11:56.000Z

Thanks for your message!

I used ERVmap by installing it in the traditional way, not by the Docker method. Maybe you can check if the bed files and the ref genome version are matched.

Best,

Answer 3 · 2022-02-15T23:22:01.000Z

Hi @whiteorchid Thanks for your message. Could you please share your install code with me?
I just used conda, but some package is not available...

Best,

Answer 4 · 2022-02-16T12:00:01.000Z

Sorry, currently I do not have access to the previous server.

I find the trim script kindly given by Professor Kong. For all other processes, I just follow the instructions on the Github page.



#!/usr/bin/perl -w

#
# a simple program for post-processing paired-end reads after btrim trimming
#
# Yong Kong
# Yale University 
#

use strict;
 
my $f1 = shift;  # summary file 1
my $f2 = shift;  # summary file 2
my $s1 = shift;  # trim output file 1
my $s2 = shift;  # trim output file 2

my $check = shift || 0;

die "usage: $0 <summary file 1> <summary file 2> <trim output file 1> <trim output file 2>" unless ($f1 && $f2 && $s1 && $s2);

my ($fh1, $fh2);
my ($sh1, $sh2);
my ($oh1, $oh2);
my ($ph1, $ph2);

open($fh1, "<$f1") || die "cannot open $f1";
open($fh2, "<$f2") || die "cannot open $f2";

open($sh1, "<$s1") || die "cannot open $s1";
open($sh2, "<$s2") || die "cannot open $s2";

my $o1 = "${s1}.pe";  # reads in both ends passed
my $o2 = "${s2}.pe";

my $p1 = "${s1}.se";  # read in trim file1 passed, but the pair in file2 failed
my $p2 = "${s2}.se";  # read in trim file2 passed, but the pair in file1 failed

open($oh1, ">$o1") || die "cannot open $o1";
open($oh2, ">$o2") || die "cannot open $o2";

open($ph1, ">$p1") || die "cannot open $p1";
open($ph2, ">$p2") || die "cannot open $p2";

while (my $l1 = <$fh1>, my $l2 = <$fh2>) {
    chomp($l1);
    my @t1 = split(/\t/, $l1);
    my $n1 = $t1[0];
    my $s1 = $t1[1];

    chomp($l2);
    my @t2 = split(/\t/, $l2);
    my $n2 = $t2[0];
    my $s2 = $t2[1];
	
    if ($check) {
	my ($n11, $n12);
	my ($n21, $n22);
	my $dummy;
	($n11, $dummy, $n12) = ($n1 =~ /(.*)(\s+|\/)(1|2)/);
	($n21, $dummy, $n22) = ($n2 =~ /(.*)(\s+|\/)(1|2)/);
	
	die ">$n11< ne >$n21<" if ($n11 ne $n21);
	die ">$n12< == 1" unless ($n12 == 1);
	die ">$n22< == 2" unless ($n22 == 2);
    }


    if ($s1 eq 'Pass' && $s2 eq 'Pass') {
	&find_and_print($sh1, $oh1, $n1);
	&find_and_print($sh2, $oh2, $n2);
    } elsif ($s1 eq 'Pass') {
	&find_and_print($sh1, $ph1, $n1);
    } elsif ($s2 eq 'Pass') {
	&find_and_print($sh2, $ph2, $n2);
    }
}

close($fh1);
close($fh2);
close($sh1);
close($sh2);
close($oh1);
close($oh2);
close($ph1);
close($ph2);


print "Done!\n";


sub find_and_print {
    my ($ih, $oh, $name) = @_;

    while (defined(my $sl = <$ih>)) {
	chomp($sl);
	my @t = split(/\t/, $sl);
	my $n = $t[0];

	if (substr($n, 1) eq $name) {
	    print $oh "$sl\n";
	    $sl = <$ih>;
	    print $oh "$sl";
	    $sl = <$ih>;
	    print $oh "$sl";
	    $sl = <$ih>;
	    print $oh "$sl";
	    last;
	}
    }

}

Answer 5 · 2022-02-16T12:57:02.000Z

Hey @whiteorchid thanks for your reply.

This is my code.

dir=~/PRJNA82747 
index=~/t12a/reference/hg38_ek12/STAR_index_star_2.7.6a
erv_bed=/home/zhou/soft/ERVmap
mkdir -p ERV.map
cd ${dir}/ERV.map
mkdir -p results
cd ${dir}/ERV.map/results

    singularity exec \
    --bind ${dir}/raw:/data:ro \
    --bind ${index}:/genome:ro \
    --bind ${erv_bed}:/resources:ro \
    --bind ${dir}/ERV.map/results/${i}:/results \
    ~/singularity/ervmap.1.2.1.sif /scripts/ERVmapping.sh \
    --read1 ${dir}/raw/${i}_1.fastq.gz --read2 ${dir}/raw/${i}_2.fastq.gz \
    --output ${i} -m ALL \
    --cpus 12 --limit-ram 151290751290 -d

I just used GRCh38.primary_assembly.genome.fa and gencode.v34.annotation.gtf for STAR_index_star_2.7.6a.
Best,
Jian-Guo