TopQuadrant/shacl

New release (to address vulnerable dependencies)?

costas80 opened this issue · 14 comments

I'm posting this as a follow-up to the PR I made earlier today which has been merged to master (thanks!). The issue at hand was the bump of Jena to 4.2.0 to address CVE-2021-39239.

Do you have a view on when a new TopBraid release would be available that would include this fix?

This is becoming a surprisingly tricky topic. As background, this API was meant as reference implementation of SHACL. We are also using a copy of this library in our own product, TopBraid EDG. Code maintenance of the public API only makes sense for us when we can basically use the same source code. Otherwise, we have no realistic way of testing the API for the real world. I have no idea what the open source users are doing.

Over the years however the two copies became severely out of synch. On the product we have moved the SHACL-JS support into separate packages which are no longer under org.topbraid.shacl. This is probably a good thing because upgrading to Java 11 was causing issues between Nashorn and GraalVM. While I don't know if anyone relies on SHACL-JS support, it is still something that will be lost moving forward.

We have also introduced quite a number of built-in constraints (in the tosh.ttl) file which can be used for schema validation, e.g. checking whether shape declarations are actually correct or not. These may or may not be desirable for the open source users, I just don't know.

So I have prepared a shacl-1.4.0 branch that has the code that we also use in the product right now, but I am not quite sure yet when and whether to publish that version.

To be honest, the best outcome would be if someone from the open source community helps/takes ownership of this project here. I'd be more than happy to contribute updates of the functionality, yet I find it increasingly difficult to handle the whole questions of implications for the builds and maven while I am not really using the open source library myself in this current form. I'd be happy to give the necessary permissions to someone else.

Any volunteers, and how would I find them?

Hi Holger,
It is very good topic for discussion. I have a question related to this. I know Jena SHACL package was updated in the last 1-2 years. Is there some estimate of a "gap" analysis between Jena SHACL package and TopQuadrant SHACL API? I mean what are the key functionalities that TopQuadrant SHACL API brings on top of Jena? if this is kind of known, maybe there is a way to integrate this in Jena SHACL package and have the bigger community taking care of maintenance and further development.

Hi @HolgerKnublauch , regarding the SHACL-JS support point you mention I see that there are no dependents (at least on GitHub). For the Java SHACL API (The current repo) however there are well over a hundred dependents so its pretty safe to assume that the library is very much in use.

On the other hand the point of @gridDigIt is very valid. Seeing how we use TopBraid it seems like we could drop in Jena SHACL and, in theory get the same results (at least API wise - haven't tested this). It would indeed be interesting to see what the gap is from your perspective. With that as a starting point the options would become more clear (e.g. have TopBraid depend on Jena SHACL and greatly reduce its footprint or simply propose the gap via PRs as additional features to Jena SHACL). For example, for the TopBraid EDG, why do you not use Jena SHACL?

Edit: Quick google shows already an answer to this: https://groups.google.com/g/topbraid-users/c/DaYd4ol-wF0 (should have checked before posting my comment)

The main gaps are in the SHACL-AF (Advanced Features): https://w3c.github.io/shacl/shacl-af/

Andy would know best but I assume Jena-SHACL doesn't support SHACL inferencing rules, node expressions and user-defined SPARQL functions.

At this stage the TopBraid SHACL API has played its part in the evolution and it doesn't need to remain in its current role. If someone wants to add the missing features to Jena then this is fine for me. However, if the TopBraid SHACL API whithers away then the SHACL-AF features will have little support moving forward. We see huge value in the inferencing rules, sh:values in particular.

The Jena SHACL API was created only very recently and "our" implementation has various optimizations that are relevant for our product, so there is no way for us to switch, but that shouldn't bother anyone.

Hi @HolgerKnublauch , I think it will take some time to find people to take over the open source project or see if, what and how gets merged to Jena SHACL. Regarding your point on when and whether to publish a new release I think that when a high risk vulnerability comes up it is a good point to do so. The problem here is that the fix would require moving from Jena 3 to 4 which is very risky if you don't know exactly how TopBraid would be affected.

Do you think that a 1.4.0 release published now would be potentially unstable? In parallel to this the search for further people to help with maintenance and further development of course could and should proceed.

Which features here do you need that are not covered by jena-shacl?

We (the Interoperability Test Bed team at the European Commission) provide a generic SHACL validator that can be used with any shapes to validate RDF. So basically if a user makes use of SHACL-AF features these would not be supported with Jena SHACL. Also we can't know if among our users we have any such cases. We have users (projects) we know about as they run on our cloud infra but our validator is also on the Docker Hub and is quite heavily used.

Hey @HolgerKnublauch . It would probably be best to split what we've been discussing here in two issues. The current one can focus on making a new release and the other one can be a discussion to see if and how the current repo can continue being maintained. Like that the second one will also get more visibility within the repo's users.

I agree with Costas. It makes sense to split.
I see that we need to check with Andy if Jena SHACL covers SHACL-AF (Advanced Features), SHACL inferencing rules, node expressions and user-defined SPARQL functions and see if there is room to add those to Jena SHACL. Or do some sort of integration so that the additional stuff uses Jena SHACL as well in case not all is added in Jena.

afs commented

It would still be good to have a release here that incorporated the fix for CVE-2021-39239.

Jena SHACL does not cover all of AF. It does have SPARQL targets.
The ITB is validation only? Quite a lot of AF is not validation.

If you want to discuss more, it's probably better to use users@jena or dev@jena if this project is going to go quiet.

To answer the ITB point by @afs

The ITB is validation only? Quite a lot of AF is not validation.

Indeed the ITB is focused only on validation. But to be honest I cannot be certain that there is no user out there that has used SHACL AF for some purpose. If we switched to Jena SHACL I would obviously be wary of issues popping up for users' shapes that suddenly stopped giving the expected results.

After a very busy week I today finally had some time to do a new release.

I went with the long-term option instead of doing a patched release. So SHACL-JS is gone from 1.4.0 and it uses the same code that we also use in the product. This means that moving forward I can more easily keep things alive.

Would be good to get confirmation that the binaries that I have uploaded actually work?

Thanks for the good news @HolgerKnublauch ! I would be happy to test the latest 1.4.0 version with different sets of shapes and report back. However, I don't see the latest release on Maven Central. Are you waiting for such a verification before publishing there? (I can build and test 1.4.0 from source if that's the case)

I made a series of tests with various sets of shapes and encountered no problems. In my view @HolgerKnublauch, the new binaries are fine. I also just noticed that 1.4.0 is now available on Central.

From my point of view this issue can be considered as closed. I would suggest that if the discussion is to continue on the coverage/overlap/merge between TopBraid SHACL and Jena SHACL this should be followed up in a separate issue (your call @HolgerKnublauch ).

Once again @HolgerKnublauch , many thanks on the good follow-up!