marbl/CHM13

Issue with preparing TxDB and GRanges object from GFF file

nicodemus88 opened this issue · 4 comments

Hi,
I am having some trouble making a gene annotation object (TxDB or GRanges) for CHM13 from the GFF file.
When making the TxDB object, I got the following warning message:

> txdb <- getTxDb(organism = "Homo sapiens", file = "./ref_genome/chm13v2.0_RefSeq_Liftoff_v5.1.gff3")
Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
Warning messages:
1: In .extract_transcripts_from_GRanges(tx_IDX, gr, mcols0$type, mcols0$ID,  :
  some transcripts have no "transcript_id" attribute ==> their name ("tx_name" column
  in the TxDb object) was set to NA
2: In .extract_transcripts_from_GRanges(tx_IDX, gr, mcols0$type, mcols0$ID,  :
  the transcript names ("tx_name" column in the TxDb object) imported from the
  "transcript_id" attribute are not unique
3: In .find_exon_cds(exons, cds) :
  The following transcripts have exons that contain more than one CDS (only the first
  CDS was kept for each exon): NM_001134939.1, NM_001172437.2, NM_001184961.1,
  NM_001301020.1, NM_001301302.1, NM_001301371.1, NM_002537.3, NM_004152.3,
  NM_015068.3, NM_016178.2

Is this normal? I also noticed that there are quite a lot of NAs in the table and was having difficulties converting it to a GRanges object.

I think the annotation would be very useful to everyone else, so would it be possible for your team to host a TxDB / GRanges object of the annotation here?

Thank you very much.

Hi, any update? I am looking for the same TxDB and GRanges files for the T2T genome.

I also run into this, can someone please comment on this? I wonder whether it is related to the known issue with the gff3 file sometimes missing a "parent" entry

Hello, could you provide a few examples of how the TxDB and GRanges files are used? I'd like to compare those from hg38 annotations to see how they differ.