Issue with preparing TxDB and GRanges object from GFF file
nicodemus88 opened this issue · 4 comments
Hi,
I am having some trouble making a gene annotation object (TxDB or GRanges) for CHM13 from the GFF file.
When making the TxDB object, I got the following warning message:
> txdb <- getTxDb(organism = "Homo sapiens", file = "./ref_genome/chm13v2.0_RefSeq_Liftoff_v5.1.gff3")
Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
Warning messages:
1: In .extract_transcripts_from_GRanges(tx_IDX, gr, mcols0$type, mcols0$ID, :
some transcripts have no "transcript_id" attribute ==> their name ("tx_name" column
in the TxDb object) was set to NA
2: In .extract_transcripts_from_GRanges(tx_IDX, gr, mcols0$type, mcols0$ID, :
the transcript names ("tx_name" column in the TxDb object) imported from the
"transcript_id" attribute are not unique
3: In .find_exon_cds(exons, cds) :
The following transcripts have exons that contain more than one CDS (only the first
CDS was kept for each exon): NM_001134939.1, NM_001172437.2, NM_001184961.1,
NM_001301020.1, NM_001301302.1, NM_001301371.1, NM_002537.3, NM_004152.3,
NM_015068.3, NM_016178.2
Is this normal? I also noticed that there are quite a lot of NAs in the table and was having difficulties converting it to a GRanges object.
I think the annotation would be very useful to everyone else, so would it be possible for your team to host a TxDB / GRanges object of the annotation here?
Thank you very much.
Hi, any update? I am looking for the same TxDB and GRanges files for the T2T genome.
I also run into this, can someone please comment on this? I wonder whether it is related to the known issue with the gff3 file sometimes missing a "parent" entry
Hello, could you provide a few examples of how the TxDB and GRanges files are used? I'd like to compare those from hg38 annotations to see how they differ.