Error in `[[<-`(`*tmp*`, name, value = new("SimpleIntegerList", elementType = "integer",
mmpust opened this issue · 4 comments
mmpust commented
Hi,
I am running into the following error message:
# download file
wget ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/019/968/075/GCF_019968075.1_ASM1996807v1/GCF_019968075.1_ASM1996807v1_genomic.gff.gz
# unzip and replace seqnames
gunzip GCF_019968075.1_ASM1996807v1_genomic.gff.gz
sed -i 's/NZ_CP065381.1/chr1/g' GCF_019968075.1_ASM1996807v1_genomic.gff
# Run in R
txdb <- makeTxDbFromGFF("GCF_019968075.1_ASM1996807v1_genomic.gff")
Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
Warning message:
In .extract_transcripts_from_GRanges(tx_IDX, gr, mcols0$type, mcols0$ID, :
the transcript names ("tx_name" column in the TxDb object) imported from the "Name" attribute are not unique
tx <- transcripts(txdb, columns=c("gene_id", "tx_id", "tx_name"))
tx
GRanges object with 2695 ranges and 3 metadata columns:
seqnames ranges strand | gene_id tx_id tx_name
<Rle> <IRanges> <Rle> | <CharacterList> <integer> <character>
[1] chr1 1-1323 + | I5Q00_RS00005 1 dnaA
[2] chr1 1724-2830 + | I5Q00_RS00010 2 dnaN
[3] chr1 2842-3057 + | I5Q00_RS00015 3 I5Q00_RS00015
[4] chr1 3061-4182 + | I5Q00_RS00020 4 I5Q00_RS00020
[5] chr1 4270-4521 + | I5Q00_RS00025 5 I5Q00_RS00025
... ... ... ... . ... ... ...
[2691] chr1 2813427-2814548 - | I5Q00_RS13455 2691 I5Q00_RS13455
[2692] chr1 2814554-2815636 - | I5Q00_RS13460 2692 I5Q00_RS13460
[2693] chr1 2815899-2816198 - | I5Q00_RS13465 2693 yidD
[2694] chr1 2816195-2816593 - | I5Q00_RS13470 2694 rnpA
[2695] chr1 2816841-2816975 - | I5Q00_RS13475 2695 rpmH
-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths
promoterAnnotation <- preparePromoterAnnotation(txdb, species='Faecalibacterium')
Extract exons by transcripts...
Identify overlapping first exons for each gene...
Prepare mapping between transcripts, tss, promoters and genes...
Prepare annotated intron ranges...
Annotating reduced exon ranges...
Error in `[[<-`(`*tmp*`, name, value = new("SimpleIntegerList", elementType = "integer", :
2667 elements in value to replace 2695 elements
In addition: Warning messages:
1: In .set_group_names(grl, use.names, txdb, by) :
some group names are NAs or duplicated
2: In .set_group_names(ans, use.names, x, "tx") :
some group names are NAs or duplicated
3: There was 1 warning in `mutate()`.
ℹ In argument: `IntronEndRank = max(.data$INTRONRANK) - .data$INTRONRANK + 1`.
Caused by warning in `max()`:
! no non-missing arguments to max; returning -Inf
4: There was 1 warning in `mutate()`.
ℹ In argument: `MinIntronRank = min(.data$INTRONRANK)`.
Caused by warning in `min()`:
! no non-missing arguments to min; returning Inf
5: There was 1 warning in `mutate()`.
ℹ In argument: `MaxIntronRank = max(.data$INTRONRANK)`.
Caused by warning in `max()`:
! no non-missing arguments to max; returning -Inf
6: There was 1 warning in `mutate()`.
ℹ In argument: `TxWidthMax = max(.data$TxWidth)`.
Caused by warning in `max()`:
! no non-missing arguments to max; returning -Inf
7: There were 3 warnings in `mutate()`.
The first warning was:
ℹ In argument: `MinMergedIntronRank = min(.data$MinIntronRank)`.
Caused by warning in `min()`:
! no non-missing arguments to min; returning Inf
ℹ Run dplyr::last_dplyr_warnings() to see the 2 remaining warnings.
8: There was 1 warning in `filter()`.
ℹ In argument: `.data$MinIntronRank == min(.data$MinIntronRank)`.
Caused by warning in `min()`:
! no non-missing arguments to min; returning Inf
Any ideas what I can do about it?
Thanks in advance!
jonathangoeke commented
Hi @mmpust it looks like the transcript names are not unqiue, which could cause a problem? can you make the transcript names unique, for example by adding a running index or something like that?
houruiyan commented
I also meet this problem when I used the gtf file to do annotation, how to solve it?
Thank you very much!
library(proActiv)
## From GTF file path
gtf.file <- '/mnt/ruiyanhou/nfs_share2/RNA_seq_organ_species/chicken/ref_files/galGal4.ensGene.gtf'
promoterAnnotation.gencode.v34.subset <- preparePromoterAnnotation(file = gtf.file,
species = 'Gallus_gallus')
The error looks like this
jleechung commented
I haven't had time to look into this yet in detail, but could you first try subsetting your gtf to standard chromosomes:
awk -F'\t' '$1 ~ /^chr([0-9]+|M)$/' galGal4.ensGene.gtf > galGal4.ensGene.filtered.gtf
then create the transcript database and build promoter annotations?
library(GenomicFeatures)
library(proActiv)
path <- 'galGal4.ensGene.filtered.gtf'
txdb <- makeTxDbFromGFF(path)
anno <- preparePromoterAnnotation(txdb = txdb, species = 'galGal')
houruiyan commented