logic-ng/LogicNG

Lazy evaluation for PseudoBooleanParser in FormulaFactory

axkr opened this issue ยท 14 comments

axkr commented

Because FormulaFactory isn't thread-safe can we have same form of lazy evaluation in FormulaFactory?

For example set the PseudoBooleanParser to null by default:

And if the parser is really used create it:

			if (parser == null) {
				parser = new PseudoBooleanParser(this);
			}
			// now use the parser

Maybe the same for CNFEncoder, PBEncoder ?

Would you like to explain the use case where you would need this? I don't think that it would have a relevant impact on performance or storage, unless you're creating thousands of FormulaFactories.

axkr commented

I think with this change we can completely bypass the initialization of antlr.
See axkr/symja_android_library#65

So the problem would be that you're using a different version of AntLR as the one used in LogicNG?
If Java would have a build-in construct for late initialization, I agree that these fields might be candidates for it. But since Java doesn't have it and the suggested solution doesn't really match with the design choices in LogicNG, I don't really see that this justifies changes to the code. This is basically an "outside" problem with dependencies.
On the other hand, if @czengler agrees with the solution, I could live with it. ;)

Also keep in mind that the problem would only be deferred to the time when someone actually uses the parser. I he never uses any parser -- ok. But otherwise the problem might occur at a later point where it is even worse to debug.

axkr commented

Symja itself doesn't use antlr, so I would avoid the initialization of antlr for every Java thread (for example in a web application) there I call boolean functions supported by LogicNG.
The user issue 65 on Android should only illustrate the initialization path to org.antlr.v4.runtime.atn.ATNDeserializer.deserialize(ATNDeserializer.java:129)

I stumbled on the similar issue. LogicNG depends on ANTLR and it may be problematic for some environments because of e.g. version conflicts or other factors. On the other hand, parsing formulas from strings is not what everyone needs. So some clients are forced to pay for the thing they do not use.

I'd like to continue this discussion. I see some directions where a potential solution might go:

  • Use lazy initialization inside the FormulaFactory, as initially suggested in the issue. This will avoid the runtime ANTLR classpath conflict for the clients that do not use the parser functionality. However, it makes the initialization moment be not strictly defined, which is not very nice and may defer the error for the clients that do use parser.
  • Factor out the parsing functionality out of FormulaFactory into a separate class/entity. This way the goal of "paying for parsing only if you use it" will be solved and the moment of the parser initialization will be well defined and explicit. However this requires an API change, which is expensive for clients.

Can we have a look at this issue again and discuss potential solutions?

Hi @Jeffset,

thank you very much for your input. We have discussed this issue internally quite a bit. For LogicNG 3 (currently active in development) we want to go with solution 2 and provide the parser in its own Maven artifact. This way the LogicNG core library has no external dependencies anymore. If someone needs the parser, he must include the logicng-parser artifact.

However, as you already mentioned, for LogicNG 2, this would be a breaking change, which we would not like to include in a minor release. My collegue @SHildebrandt had the idea to go the other way round for LogicNG 2:

  • extracting everything except the parser in a logicng-core artifact
  • the logicng artifact then has a dependency to the logicng-core artifact and just adds the parser

This has the advantage that for users currently including the logicng artifact, nothing changes - they still get the core library and the parser under the same artifact. For users like you, which do not want the parser, they have to change the depency to the logicng-core artifact.

The remaining problem is, that a lot of tests in the core library use the parser. But I think we can get around with that by including the parser library as a test-only dependency.

What are your thougts on that?

Of course this would also mean that we have to remove the parse() method on the formula factory. But that would be a breaking change, I can live with, since it's just a shortcut for creating the parser your own and using it. In all of our projects, we barely use this function, since mostly we have customer-specific parsers anyway...

axkr commented

+1 for using the LogicNG 3 solution.

IMO, if you do a change for LogicNG 2, this solution should also work in LogicNG 3 without refactoring again.

Thanks for the reply, @czengler! It's great to hear that in the major (3.x) version this issue will be solved.

I would appreciate still if we make the necessary changes to the current 2.x version. I think the approach with the logicng/logicng-core artifacts works great, thanks for sharing that. However the need to remove the parse() method technically does break the compatibility and that's something we should at least try to avoid, even if we assume the damage will be minimal.

So I suggest the following addition to the approach. In order to stay fully compatible we can do the following:

  • In the logicng-core artifact, we keep a renamed FormulaFactory class (something like FormulaFactoryCore) with all the logic except the parse() method.
  • In the logicng artifact with all the parsing code we extend the FormulaFactoryCore class (like FormulaFactory extends FormulaFactoryCode) and put the parse() method there.

This way the clients of logicng-core can use FormulaFactoryCore class without parsing, and the clients of logicng artifact will still have their full compatibility. And we are going to have zero diff in tests code, as we can plug the logicng artifact straight into the logicng-core's test scope.

I can probably do this all myself and open a PR, because I have already made a fork (as I need a solution in the very near future). In there I move the parser-related code into the test scope and then pull the method parse() into the test-only ForumulaFactoryWithTesting extends FormulaFactory to minimize the diff in unit tests. So, if your team will be ready to review a contribution, I can put a bit more work into this refactoring to keep the parser code in the separate artifact and would gladly avoid republishing the modified library. How does that sound?

Hi @Jeffset,

Thanks for your input!
We had a very long discussion last week and tried out different solutions, trying to find a reasonable trade-off between preventing/minimizing breaking changes and ugly workarounds or hacks.

So this is what we decided:

  • The LogicNG repository (and thus the artifact org.logicng:logicng) will not contain the parsers anymore and can thus drop the ANTLR dependency. We still need the parser in the tests, but since we did not find a proper alternative we will just copy the parser grammars and classes to the tests. Furthermore, this artifact will no longer be released for Java 11 (org.logicng:logicng-j11), since there is no need for it anymore.
  • The parser moves to a new repository (currently still private) and gets its own artifacts org.logicng:logicng-parser-j8 and org.logicng:logicng-parser-j11.
  • The method FormulaFactory.parse() will be removed without explicit replacement. Instantiating you own parser instead of using FormulaFactory.parse() should not be a big deal in almost all use cases. We're aware of the idea to extend the FormulaFactory in the parser library, but we think that users can do that themselves if they really want such a method (also, considering languages like Kotlin, extension methods would be a far better choice).
  • The FormulaParser will be moved from package io.readers to io.parsers.readers since the packages of the two libraries must not overlap (for Java 8 this would be fishy, for Java 11 it is not possible because of the module system) and we didn't want to move the DimacsReader to the parser artifact.

So people who are not using the parsers should not recognize any difference (i.e. breaking change) with the new version. They can just stick with the artifact they're already using.
People using the parsers will need to switch to one of the new artifacts. This is a breaking change, but a quite obvious one (causing direct compile-time issues and no runtime issues) and easy to fix.

This will all come with LogicNG 2.5.0 which also brings some other features (see Changelog) and we will release it in the course of this week.

Feel free to share your opinion on that. However, as I said, this is the result of a really long and thorough discussion. Of course some things could be decided differently, but there are also many subtle problems (which do not make sense to list here) with other solutions.

Ok, thanks for letting me know! I appreciate the fact that you are going to release 2.5.0 with the issue fixed, that's what is really important. In my previous comment I was merely suggesting a workaround (a hack if you will) to keep source/binary compatibility when doing a minor version bump, as per semver guidelines. However if you've decided that the simpler and cleaner solution is acceptable here - that's totally fine too. Looking forward to the 2.5.0!

Hi @Jeffset , all LogicNG 2.5.0 artifacts are released as described above. Hope this works for you. Let us know if you encounter further problems!

They should:

However, sometimes it takes a few hours till the artifacts are propagated in every search engine etc. E.g. https://search.maven.org/ does not list it at the moment - but they are there ;)

axkr commented

Ok, artifact id changes from logicng-j11 to logicng in my use case.