CGATOxford/CGATPipelines

GCProfile

Closed this issue · 13 comments

Hi @Acribbs

As discussed, the existing pipeline_annotations.py depends on GCProfile:
https://github.com/CGATOxford/CGATPipelines/blob/master/CGATPipelines/pipeline_annotations.py#L872

which is a piece of outdated software, compiled for 32-bit architectures only.

This is causing the existing annotations pipeline to fail when running pipeline testing on a different cluster.

Question: does the new geneset pipeline depends on GCProfile as well? If so, could this be avoided?

Best regards,
Sebastian

Hi Sebastian,

The new pipelne_genesets.py does rely on this and the line for this function is here: https://github.com/CGATOxford/CGATPipelines/blob/AC-pipeline_pub/CGATPipelines/pipeline_genesets.py#L508

@AndreasHeger are you able to look at this tomorrow and see if there a workaround so we dont have to rely on this old software please?

I will now check to see where downstream this output file is used.

I can confirm that 90% of the time when our pipeline_annotations runs fail, its GCProfile's fault.

@IanSudbery do you know where this part of the pipeline output is used in downstream pipelines? The idea is to try and remove this from our pipelines in the future but if the output isn't used then we can most likely just remove this bit of the code

I don't know, but I'm guessing its used to define the isocores that are used as GAT workspaces.

Thanks. Correct, GCProfile is used for enrichment analyses with GAT. It can be removed. pipeline_intervals might contain reference to the gcprofile.bed file.

I thought it did have a reference in intervals pipeline but I checked the pipeline yesterday and couldn't find any mention of it. I will have another look when im back in on monday and then if @sebastian-luna-valero can remove it from annotations pipeline I can remove it from the new geneses pipeline.

Thanks, All. @AndreasHeger could you please look at removing GCProfile?

Ok, will submit a PR.

Shall I do this on master or RefactorCore?

Many thanks!

My preference would be master so I can continue working on portability issues with production pipelines.

We'll need to refactor again anyway since our plan now is to move to a new core repo, and therefore we will probably remove the CGATCore one.

Ok, I have created two pull requests

Many thanks!

I am waiting for Adam's input regarding the PR in pipelines and the png files, and also for the tests to finish (which now take much longer for the scripts due to the OS X builds on travis)

Will merge next week.

Actually the only doc files that need to be tracked are Figure_for_documentation.png and Slide2.png. These are used for documentation in the introduction and tutorial section.