brsynth/rptools

Call to UniProt in PartsGenie is broken

pablocarb opened this issue · 15 comments

The current version of the PartsGenie server, which is used as part of the Genetic Design workflow in Galaxy SynBioCAD tries to retrieve the protein sequences from UniProt using an obsolete way that gives an error:

  • Current query in PartsGenie (wrong):
https://www.uniprot.org/uniprot/?query=P77434&format=tab&columns=id
  • The above query should be replaced by the new one:
https://rest.uniprot.org/uniprotkb/search?query=P77434&format=tsv&fields=id,sequence

See https://www.uniprot.org/help/api_queries

More precisely, the issue is found here:

https://github.com/neilswainston/PartsGenie/blob/863f4c95832cf6d2c43647c7ff648d021b28ee6e/parts_genie/parts.py#L179C1-L186C78

A way to fix the problem without touching PartsGenie, in principle, would be to add the amino acid sequence (from UniProt, etc) into the SBOL file prior to submitting it to PartsGenie.

Actually, when the request to PartGenie is passed in a sbol file, there is no way to avoid the call to UniProt for the CDS:

https://github.com/neilswainston/PartsGenie/blob/863f4c95832cf6d2c43647c7ff648d021b28ee6e/parts_genie/sbol_utils.py#L102

Therefore, I guess that the best would be to modify this part of the code to update the UniProt call, and, more interestingly to ensure stability and flexibility, to add the possibility of passing directly the amino acid sequences. For instance, if we want to pass a sequence of mutants that are not present in UniProt, etc.

Hi Pablo,

thank you for writing this issue.

When I click on the code links you put I don't see any reference to UniProt request. Perhaps the best thing would you fork the original repo to make these changes in a separate branch, and after being tested, you make a PR on the original repo (I should have rights to merge the PR). What do you think about it?

Joan

Hi Joan,

Ok, thank you. I will have a look at it and let you know once it is ready for the merge. In principle, the fix should be straightforward, although I need first to make sure that I can get a working version installed on a machine so that I can perform the tests.

Pablo

Hi Pablo,

I saw you made a PR on Neil's repo (neilswainston/PartsGenie#4). But when I tried to do the modifications related to the first commit "Initial fix solution" in sbol_utils.py (neilswainston/PartsGenie@924f992), I have the following error on Galaxy-SynBioCAD:

File "/opt/conda/envs/parts_genie/lib/python3.7/site-packages/sbml2sbol/sbol/lib/Linux/libsbol.py", line 15217, in read
    return _libsbol.Document_read(self, filename)
LookupError: The http://sbols.org/v2#identity property has not been set

Does your local version work fine or does it have the same error?

Hi Joan,

My local version is working, but I have tested it through command line, not inside Galaxy-SynBioCAD:

  1. Start a local instance of the updated PartsGenie server from my forked branch https://github.com/pablocarb/PartsGenie/tree/pablo
python main.py
  1. Run your query from the client https://github.com/neilswainston/PartsGenieClient
python -m partsgenie_client http://127.0.0.1:5000 data/sbol_rbs.xml 37762 out

Your issue seems to be related to the sbml2sbol library. I am using sbml2sbol version 0.1.13 from conda-forge. Maybe you could share an example of the sbol input file sent by the PartsGenie client inside Galaxy-SynBioCAD.

Hi Pablo,

ok thanks for the feedback, what is the version of rdflib and importlib-metadata on your installation?

Hi Joan,

These are the versions:

importlib-metadata        4.11.4
rdflib                    7.0.0

ok, I was cloning the wrong branch on your repo. So, from your repo, I built the docker image and launch the server within the docker (as always on my side) and I encountered the same error:

ERROR in main: Exception: The http://sbols.org/v2#identity property has not been set

sbml2sbol has the version below:

sbml2sbol 0.1.13 pyhd8ed1ab_0 conda-forge

I am not sure how you build the docker image. I tried to build the docker image and it gave me an error when installing the Vienna package.

docker build -f Dockerfile-non-google -t partsgenie .

Please be aware that my branch was forked from https://github.com/neilswainston/PartsGenie and has been tested only through command line, as described before. It is possible that the docker files in my branch are not the correct ones.

ok but for the solution fork-and-run, you had to install packages to make it run, how did you do that?

About your last comment, why taking Dockerfile-non-google since there are 3 docker files in the repo?

This is right. I installed the conda packages manually by looking at the dependencies and generated the file environment.yaml, which contains the conda packages necessary to run the server. I did this because I wanted to be sure that conda-forge packages were only used as a last resource, otherwise they can create dependencies issues. I think that it would be better to have this same environment for the Docker file.

Regarding your question about the Dockerfile-non-google, I was just following the command in build.sh, because I do not have enough background information to do it in a different way.

ok but I have still the same error with sbml2sbol 0.1.13:

ERROR in main: Exception: The http://sbols.org/v2#identity property has not been set

I don't know what that means and there is no similar issue on Internet. Do you have any thought?

Hi Joan,
I don't think that I have access to the file. Could you please send it to me by e-mail?
Thank you

Pablo, after you convert the file into SBOL, it works like a charm. It was the wrong input file from the beginning...
Thank you!