scijava/scyjava

Issue with too many endpoints

Closed this issue ยท 8 comments

There's an issue when adding too many endpoints due to the way they are concatenated and stored as a filename. Here's a working example:

from scyjava import config, start_jvm
from importlib.metadata import version  

config.add_endpoints(
    "io.github.egonw.bacting:managers-semweb:0.0.20",
    "io.github.egonw.bacting:managers-inchi:0.0.20",
    "io.github.egonw.bacting:managers-pubchem:0.0.20",
    "io.github.egonw.bacting:managers-xml:0.0.20",
    "io.github.egonw.bacting:managers-rdf:0.0.20",
    "io.github.egonw.bacting:managers-bioinfo:0.0.20",
    "io.github.egonw.bacting:managers-oscar:0.0.20",
    "io.github.egonw.bacting:managers-cheminfo:0.0.20",
    "io.github.egonw.bacting:managers-ui:0.0.20",
    "io.github.egonw.bacting:managers-excel:0.0.20",
    "io.github.egonw.bacting:managers-opsin:0.0.20",
    "io.github.egonw.bacting:managers-cdk:0.0.20",
    "io.github.egonw.bacting:managers-biojava:0.0.20",
    "io.github.egonw.bacting:managers-bridgedb:0.0.20",
    "io.github.egonw.bacting:bacting-core:0.0.20",
)

if __name__ == '__main__':
    print("jgo", version("jgo"))
    print("scyjava", version("scyjava"))
    start_jvm()

It produces the following traceback (I'm on mac OS - maybe this isn't such an issue on windows or linux, but I bet they also have file name length limits)

Traceback (most recent call last):
  File "/Users/cthoyt/dev/pybacting/src/pybacting/test.py", line 26, in <module>
    start_jvm()
  File "/Users/cthoyt/dev/scyjava/scyjava/__init__.py", line 48, in start_jvm
    _, workspace = jgo.resolve_dependencies(
  File "/Users/cthoyt/.virtualenvs/cheminf/lib/python3.8/site-packages/jgo/jgo.py", line 428, in resolve_dependencies
    os.makedirs(workspace, exist_ok=True)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/os.py", line 223, in makedirs
    mkdir(name, mode)
OSError: [Errno 63] File name too long: '/Users/cthoyt/.jgo/io.github.egonw.bacting/managers-semweb/0.0.20+io.github.egonw.bacting-bacting-core-0.0.20+io.github.egonw.bacting-managers-bioinfo-0.0.20+io.github.egonw.bacting-managers-biojava-0.0.20+io.github.egonw.bacting-managers-bridgedb-0.0.20+io.github.egonw.bacting-managers-cdk-0.0.20+io.github.egonw.bacting-managers-cheminfo-0.0.20+io.github.egonw.bacting-managers-excel-0.0.20+io.github.egonw.bacting-managers-inchi-0.0.20+io.github.egonw.bacting-managers-opsin-0.0.20+io.github.egonw.bacting-managers-oscar-0.0.20+io.github.egonw.bacting-managers-pubchem-0.0.20+io.github.egonw.bacting-managers-rdf-0.0.20+io.github.egonw.bacting-managers-ui-0.0.20+io.github.egonw.bacting-managers-xml-0.0.20'

Process finished with exit code 1

The offending line of code is:

'+'.join(endpoints),

And later, the transitive code from JGO that parses it up, based on the file name being "+" delimited

https://github.com/scijava/jgo/blob/c6681fc12674615a5c62bf2872b271d2e0c7e40c/jgo/jgo.py#L366-L373

I think an alternate solution would be to hash all of the endpoints, name a file based on this, then actually write the endpoints themselves to the file for reading later

I think an alternate solution would be to hash all of the endpoints, name a file based on this

Agreed. This work is actually done by @kephale in scijava/jgo#62. I'll try to merge and release it this week!

@ctrueden thanks for making me aware! It looks like there's a lot of reformatting in that PR which was holding up the review, so I send a PR suggesting to use black to automatically format the code and make this go faster (scijava/jgo#63)

@ctrueden I'm still getting some issues running the code (I copied it down here, now showing the versions)

from scyjava import config, start_jvm
from importlib.metadata import version  

config.add_endpoints(
    "io.github.egonw.bacting:managers-semweb:0.0.20",
    "io.github.egonw.bacting:managers-inchi:0.0.20",
    "io.github.egonw.bacting:managers-pubchem:0.0.20",
    "io.github.egonw.bacting:managers-xml:0.0.20",
    "io.github.egonw.bacting:managers-rdf:0.0.20",
    "io.github.egonw.bacting:managers-bioinfo:0.0.20",
    "io.github.egonw.bacting:managers-oscar:0.0.20",
    "io.github.egonw.bacting:managers-cheminfo:0.0.20",
    "io.github.egonw.bacting:managers-ui:0.0.20",
    "io.github.egonw.bacting:managers-excel:0.0.20",
    "io.github.egonw.bacting:managers-opsin:0.0.20",
    "io.github.egonw.bacting:managers-cdk:0.0.20",
    "io.github.egonw.bacting:managers-biojava:0.0.20",
    "io.github.egonw.bacting:managers-bridgedb:0.0.20",
    "io.github.egonw.bacting:bacting-core:0.0.20",
)

if __name__ == '__main__':
    print("jgo", version("jgo"))
    print("scyjava", version("scyjava"))
    start_jvm()

And get the following output:

jgo 1.0.2
scyjava 1.1.1.dev0
Error in `/usr/local/bin/mvn -B -f /Users/cthoyt/.jgo/io.github.egonw.bacting/managers-semweb/0.0.20/e9f8f0c2002517f2adfce2c464bfbfbd4f8c3d4b10f951afe2e4df68c62befd1/pom.xml dependency:resolve': 1

So it looks like maven isn't super happy about what's going on in here but I'm not really sure what's going on (or why we don't get a stack trace)

@cthoyt You can get more details by manually running the command listed. Also best to add the -U flag, to ensure Maven rechecks for artifacts from the remote(s). So your command will be:

/usr/local/bin/mvn -U -B -f /Users/cthoyt/.jgo/io.github.egonw.bacting/managers-semweb/0.0.20/e9f8f0c2002517f2adfce2c464bfbfbd4f8c3d4b10f951afe2e4df68c62befd1/pom.xml dependency:resolve

I tried it on my system, and the error I get is:

[ERROR] Failed to execute goal on project managers-semweb-BOOTSTRAPPER: Could not resolve dependencies for project io.github.egonw.bacting-BOOTSTRAPPER:managers-semweb-BOOTSTRAPPER:jar:0: The following artifacts could not be resolved: io.github.egonw.bacting:managers-semweb:jar:0.0.20, io.github.egonw.bacting:managers-bioinfo:jar:0.0.20, io.github.egonw.bacting:managers-cheminfo:jar:0.0.20: Could not find artifact io.github.egonw.bacting:managers-semweb:jar:0.0.20 in 1 (https://maven.scijava.org/content/repositories/releases) -> [Help 1]

The key issue here is this part:

Could not find artifact io.github.egonw.bacting:managers-semweb:jar:0.0.20

And the key part of that is jar: because managers-semweb is a POM only, with no JAR artifact.

Removing managers-semweb from your endpoints list, I then see:

Could not find artifact io.github.egonw.bacting:managers-bioinfo:jar:0.0.20

Which is another POM-only artifact.

Same for managers-cheminfo. After removing managers-bioinfo and managers-cheminfo as well, it works! ๐Ÿ™Œ

P.S. The jgo program is supposed to emit the mvn execution failure message to the console, but does not. It's a bug.

@ctrueden thank you so much for explaining this to me in detail. I think I will have to pull in @egonw to make some updates on the underlying resources to have jars inside them :)

I'm really glad the solution is working from the jgo/scyjava side now, too!

make some updates on the underlying resources to have jars inside them

@cthoyt Looking at the GitHub repository, you can see that managers-bioinfo is a parent POM, for doing the multi-module build:

https://github.com/egonw/bacting/blob/bacting-0.0.22/managers-bioinfo/pom.xml

So it is not intended to be explicitly specified as a dependency. I expect that @egonw will not need to make any changes here, but rather your Python code should just leave off specifying the POM parent artifacts.

Are you able to access all desired functionality from Python if you specify only:

config.add_endpoints(
    "io.github.egonw.bacting:managers-inchi:0.0.20",
    "io.github.egonw.bacting:managers-pubchem:0.0.20",
    "io.github.egonw.bacting:managers-xml:0.0.20",
    "io.github.egonw.bacting:managers-rdf:0.0.20",
    "io.github.egonw.bacting:managers-oscar:0.0.20",
    "io.github.egonw.bacting:managers-ui:0.0.20",
    "io.github.egonw.bacting:managers-excel:0.0.20",
    "io.github.egonw.bacting:managers-opsin:0.0.20",
    "io.github.egonw.bacting:managers-cdk:0.0.20",
    "io.github.egonw.bacting:managers-biojava:0.0.20",
    "io.github.egonw.bacting:managers-bridgedb:0.0.20",
    "io.github.egonw.bacting:bacting-core:0.0.20",
)

?

I do think that, to make things clearer, @egonw could rename the artifactId for the POM parents to something like pom-chem or chem-parent or chem-aggregator. Unfortunately, there is no one agreed-upon convention; e.g. jetty uses jetty-project.

egonw commented

@ctrueden, yes, this is how it should be indeed.

@cthoyt, that said, I am exploring how to create a bundle jar with everything. I know John did that for the cdk project, but never done this myself: egonw/bacting#56

@ctrueden thanks again for the explanation, I'm going to close this issue because the JGO update indeed fixed it. In cthoyt/pybacting#5 I was indeed able to make the list much longer!