vanheeringen-lab/genomepy

UCSC MySQL mirror selection

Closed this issue · 6 comments

Hi,

I am having issues with running genomepy install hg38 via the API within a Gitlab CI/CD environment on our internal gitlab instance although it works fine locally:

image

I am wondering if there is a way to specify a MySQL mirror for USCS genome downloads such as adding an option to use genome-euro-mysql.soe.ucsc.edu here:

host="genome-mysql.soe.ucsc.edu",

since it's the alternative mysql server:

https://genome.ucsc.edu/goldenPath/help/mysql.html

Or maybe the code can try the Europe server if the US server fails. Let me know what you think.

Hey @gokceneraslan,

this could be a nice feature, but rather situational. The difference between your local machine and the CI server is most likely the firewall. Changing the mirror probably won't help in that case :(

It could still be nice to switch to the EU mirror in case the US mirror is down. This could be done fairly easily for installing data, but for downloading the UCSC assembly metadata it would be less easy. I'll think a bit about this...

Thank you!

Another related question: is it normal to see the same error even when we call genomepy.install_genome(annotation=False)? Because what we want is just the genome sequence, not annotations.

I've made it into a config option in #242. Not super elegant, but it works :)

try it with

pip install git+https://github.com/vanheeringen-lab/genomepy.git@ucsc
genomepy config generate

and edit the ucsc_mirror in the config file.

And to your other question: the MySQL server is always queried to populate the UCSC metadata. So if you still get the error with this PR, it's very likely a firewall issue.

genomepy uses MySQL for UCSC starting from version 10, so you could also try an older version.

This feature is now available in genomepy 0.16!