Issue with too many endpoints
Closed this issue ยท 8 comments
There's an issue when adding too many endpoints due to the way they are concatenated and stored as a filename. Here's a working example:
from scyjava import config, start_jvm
from importlib.metadata import version
config.add_endpoints(
"io.github.egonw.bacting:managers-semweb:0.0.20",
"io.github.egonw.bacting:managers-inchi:0.0.20",
"io.github.egonw.bacting:managers-pubchem:0.0.20",
"io.github.egonw.bacting:managers-xml:0.0.20",
"io.github.egonw.bacting:managers-rdf:0.0.20",
"io.github.egonw.bacting:managers-bioinfo:0.0.20",
"io.github.egonw.bacting:managers-oscar:0.0.20",
"io.github.egonw.bacting:managers-cheminfo:0.0.20",
"io.github.egonw.bacting:managers-ui:0.0.20",
"io.github.egonw.bacting:managers-excel:0.0.20",
"io.github.egonw.bacting:managers-opsin:0.0.20",
"io.github.egonw.bacting:managers-cdk:0.0.20",
"io.github.egonw.bacting:managers-biojava:0.0.20",
"io.github.egonw.bacting:managers-bridgedb:0.0.20",
"io.github.egonw.bacting:bacting-core:0.0.20",
)
if __name__ == '__main__':
print("jgo", version("jgo"))
print("scyjava", version("scyjava"))
start_jvm()
It produces the following traceback (I'm on mac OS - maybe this isn't such an issue on windows or linux, but I bet they also have file name length limits)
Traceback (most recent call last):
File "/Users/cthoyt/dev/pybacting/src/pybacting/test.py", line 26, in <module>
start_jvm()
File "/Users/cthoyt/dev/scyjava/scyjava/__init__.py", line 48, in start_jvm
_, workspace = jgo.resolve_dependencies(
File "/Users/cthoyt/.virtualenvs/cheminf/lib/python3.8/site-packages/jgo/jgo.py", line 428, in resolve_dependencies
os.makedirs(workspace, exist_ok=True)
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/os.py", line 223, in makedirs
mkdir(name, mode)
OSError: [Errno 63] File name too long: '/Users/cthoyt/.jgo/io.github.egonw.bacting/managers-semweb/0.0.20+io.github.egonw.bacting-bacting-core-0.0.20+io.github.egonw.bacting-managers-bioinfo-0.0.20+io.github.egonw.bacting-managers-biojava-0.0.20+io.github.egonw.bacting-managers-bridgedb-0.0.20+io.github.egonw.bacting-managers-cdk-0.0.20+io.github.egonw.bacting-managers-cheminfo-0.0.20+io.github.egonw.bacting-managers-excel-0.0.20+io.github.egonw.bacting-managers-inchi-0.0.20+io.github.egonw.bacting-managers-opsin-0.0.20+io.github.egonw.bacting-managers-oscar-0.0.20+io.github.egonw.bacting-managers-pubchem-0.0.20+io.github.egonw.bacting-managers-rdf-0.0.20+io.github.egonw.bacting-managers-ui-0.0.20+io.github.egonw.bacting-managers-xml-0.0.20'
Process finished with exit code 1
The offending line of code is:
Line 46 in f033a10
And later, the transitive code from JGO that parses it up, based on the file name being "+" delimited
https://github.com/scijava/jgo/blob/c6681fc12674615a5c62bf2872b271d2e0c7e40c/jgo/jgo.py#L366-L373
I think an alternate solution would be to hash all of the endpoints, name a file based on this, then actually write the endpoints themselves to the file for reading later
I think an alternate solution would be to hash all of the endpoints, name a file based on this
Agreed. This work is actually done by @kephale in scijava/jgo#62. I'll try to merge and release it this week!
@ctrueden thanks for making me aware! It looks like there's a lot of reformatting in that PR which was holding up the review, so I send a PR suggesting to use black to automatically format the code and make this go faster (scijava/jgo#63)
@ctrueden I'm still getting some issues running the code (I copied it down here, now showing the versions)
from scyjava import config, start_jvm
from importlib.metadata import version
config.add_endpoints(
"io.github.egonw.bacting:managers-semweb:0.0.20",
"io.github.egonw.bacting:managers-inchi:0.0.20",
"io.github.egonw.bacting:managers-pubchem:0.0.20",
"io.github.egonw.bacting:managers-xml:0.0.20",
"io.github.egonw.bacting:managers-rdf:0.0.20",
"io.github.egonw.bacting:managers-bioinfo:0.0.20",
"io.github.egonw.bacting:managers-oscar:0.0.20",
"io.github.egonw.bacting:managers-cheminfo:0.0.20",
"io.github.egonw.bacting:managers-ui:0.0.20",
"io.github.egonw.bacting:managers-excel:0.0.20",
"io.github.egonw.bacting:managers-opsin:0.0.20",
"io.github.egonw.bacting:managers-cdk:0.0.20",
"io.github.egonw.bacting:managers-biojava:0.0.20",
"io.github.egonw.bacting:managers-bridgedb:0.0.20",
"io.github.egonw.bacting:bacting-core:0.0.20",
)
if __name__ == '__main__':
print("jgo", version("jgo"))
print("scyjava", version("scyjava"))
start_jvm()
And get the following output:
jgo 1.0.2
scyjava 1.1.1.dev0
Error in `/usr/local/bin/mvn -B -f /Users/cthoyt/.jgo/io.github.egonw.bacting/managers-semweb/0.0.20/e9f8f0c2002517f2adfce2c464bfbfbd4f8c3d4b10f951afe2e4df68c62befd1/pom.xml dependency:resolve': 1
So it looks like maven isn't super happy about what's going on in here but I'm not really sure what's going on (or why we don't get a stack trace)
@cthoyt You can get more details by manually running the command listed. Also best to add the -U
flag, to ensure Maven rechecks for artifacts from the remote(s). So your command will be:
/usr/local/bin/mvn -U -B -f /Users/cthoyt/.jgo/io.github.egonw.bacting/managers-semweb/0.0.20/e9f8f0c2002517f2adfce2c464bfbfbd4f8c3d4b10f951afe2e4df68c62befd1/pom.xml dependency:resolve
I tried it on my system, and the error I get is:
[ERROR] Failed to execute goal on project managers-semweb-BOOTSTRAPPER: Could not resolve dependencies for project io.github.egonw.bacting-BOOTSTRAPPER:managers-semweb-BOOTSTRAPPER:jar:0: The following artifacts could not be resolved: io.github.egonw.bacting:managers-semweb:jar:0.0.20, io.github.egonw.bacting:managers-bioinfo:jar:0.0.20, io.github.egonw.bacting:managers-cheminfo:jar:0.0.20: Could not find artifact io.github.egonw.bacting:managers-semweb:jar:0.0.20 in 1 (https://maven.scijava.org/content/repositories/releases) -> [Help 1]
The key issue here is this part:
Could not find artifact io.github.egonw.bacting:managers-semweb:jar:0.0.20
And the key part of that is jar
: because managers-semweb
is a POM only, with no JAR artifact.
Removing managers-semweb
from your endpoints list, I then see:
Could not find artifact io.github.egonw.bacting:managers-bioinfo:jar:0.0.20
Which is another POM-only artifact.
Same for managers-cheminfo
. After removing managers-bioinfo
and managers-cheminfo
as well, it works! ๐
P.S. The jgo
program is supposed to emit the mvn
execution failure message to the console, but does not. It's a bug.
make some updates on the underlying resources to have jars inside them
@cthoyt Looking at the GitHub repository, you can see that managers-bioinfo
is a parent POM, for doing the multi-module build:
https://github.com/egonw/bacting/blob/bacting-0.0.22/managers-bioinfo/pom.xml
So it is not intended to be explicitly specified as a dependency. I expect that @egonw will not need to make any changes here, but rather your Python code should just leave off specifying the POM parent artifacts.
Are you able to access all desired functionality from Python if you specify only:
config.add_endpoints(
"io.github.egonw.bacting:managers-inchi:0.0.20",
"io.github.egonw.bacting:managers-pubchem:0.0.20",
"io.github.egonw.bacting:managers-xml:0.0.20",
"io.github.egonw.bacting:managers-rdf:0.0.20",
"io.github.egonw.bacting:managers-oscar:0.0.20",
"io.github.egonw.bacting:managers-ui:0.0.20",
"io.github.egonw.bacting:managers-excel:0.0.20",
"io.github.egonw.bacting:managers-opsin:0.0.20",
"io.github.egonw.bacting:managers-cdk:0.0.20",
"io.github.egonw.bacting:managers-biojava:0.0.20",
"io.github.egonw.bacting:managers-bridgedb:0.0.20",
"io.github.egonw.bacting:bacting-core:0.0.20",
)
?
I do think that, to make things clearer, @egonw could rename the artifactId
for the POM parents to something like pom-chem
or chem-parent
or chem-aggregator
. Unfortunately, there is no one agreed-upon convention; e.g. jetty uses jetty-project
.
@ctrueden, yes, this is how it should be indeed.
@cthoyt, that said, I am exploring how to create a bundle jar with everything. I know John did that for the cdk project, but never done this myself: egonw/bacting#56
@ctrueden thanks again for the explanation, I'm going to close this issue because the JGO update indeed fixed it. In cthoyt/pybacting#5 I was indeed able to make the list much longer!