Bioconductor/CSAMA

Annotation with ensembldb lab

jorainer opened this issue · 8 comments

I'd need the topTable from the RNAseq workflow to start with the annotation part (@mikelove, @csoneson could you provide that at some point?)

Also, @mikelove and @csoneson I'd appreciate your input on this lab - especially since I'm no longer active in transcriptome stuff: so feel free to edit and modify the annotation-with-ensembldb.Rmd.

Additional things I thought we could include is:

  • Search for transcripts that encode proteins with a certain protein domain, such as the steroid receptor ligand binding domain (that also the glucocorticoid receptor has).
  • If we have e.g. an alignment of an sequence read within a transcript, we could use ensembldb to convert that coordinates to genome- or protein-relative coordinates.

hi @jotsetung here's a top table for tximport => DESeq2 w/ LFC threshold of 1 and 5% FDR cutoff:

https://github.com/mikelove/airway2/blob/master/inst/extdata/res_lfc-1_FDR-5.csv

The outline you’ve got looks great. The skills of working with annotation are so important that if they just get the basics they will be really well off.

I agree. Thanks for providing the csv @mikelove , I'll work a little on that to see what we can do.

@mikelove , can you please check the current Rmd? Any changes, suggestions highly welcome!

We could also add a section on generating a transcript to gene mapping table with ensembldb - I'd need your input there as I've never used tximport myself...

I'll take a look today.

Building the table is pretty simple with GenomicFeatures, so you can skip that if you like.

In the gene expression lab we just point them to the tximport vignette.

txdb <- makeTxDbFromGFF( ... )
k <- keys(txdb, keytype="TXNAME")
tx2gene <- select(txdb, k, "GENEID", "TXNAME")

Looks good @jotsetung !

OK, so no changes from your part? I think I'll kick out the Generating a transcript to gene mapping for tximport section unless you want to add that @mikelove

Should be OK by now - intentionally kept it short so that people can either finish the RNS-seq lab, or start earlier with the next labs.