No transcript_info table with pipeline_genesets
Closed this issue · 4 comments
ORIGINALLY POSTED @ CGATOxford/CGATPipelines:
pipeline_annotations used to make a table called transcript_info
https://github.com/CGATOxford/CGATPipelines/blob/94b975256fa5142f15e82511f4b62a6c19ae38b9/obsolete/pipeline_annotations.py#L1049-L1054 which is used in multiple pipeline, for example in a tracker for rnaseqdiffexpression
https://github.com/CGATOxford/CGATPipelines/blob/94b975256fa5142f15e82511f4b62a6c19ae38b9/CGATPipelines/pipeline_docs/pipeline_rnaseqdiffexpression/trackers/Genelists.py#L5-L30.
The config.ini
for pipeline_genesets
still contains an option for this table https://github.com/CGATOxford/CGATPipelines/blob/8ebe37408aa512ec910b767e07ade4d4b1733177/CGATPipelines/pipeline_genesets/pipeline.ini#L318 but the pipeline doesn't use this option and doesn't generate the table. This breaks at least one of my pipelines which integrates with pipeline_annotations/genesets. Is there any reason pipeline_genesets doesn't generate this table?
REPLY FROM @Acribbs:
When developing the geneses pipeline the intention was to initially support the bare minimum needed for the CGAT-flow pipelines to work. It was removed because it wasn't used in pipelines and CGATreports isn't supported going forward. If you think you need it then feel free to add it back in. I am using the new cgat-developers code now though.
Just checked and it's being used in PipelineGO.py
as well so I guess this should go back in assuming PipelineGO.py
is being retained?
cgat-flow/CGATPipelines/PipelineGO.py
Line 101 in 34361ed
[EDIT] Note, two functions in the PipelineGeneset.py
module use transcript_info
but don't appear to be used in any pipelines currently. Whether we keep these functions depends on whether they are in use elsewhere I guess?
cgat-flow/CGATPipelines/PipelineGeneset.py
Line 695 in 34361ed
cgat-flow/CGATPipelines/PipelineGeneset.py
Line 1263 in 34361ed
There was a discussion as to whether pipeline GO should be kept or not. I think it was to be removed because katys pipeline enrichment covered all of it. However, I haven't managed to get round to it yet.
OK. In that case, it appears the transcript_info
table should not be created by pipeline_genesets.py
if the idea is to only provide the minimum set of inputs required for the cgatflow
pipelines.
Seen as the loadEnsemblTranscriptInformation
function is still in PipelineGeneset
, I'll just use this function to create the table in my pipeline.