[Bug]: Improve clean-room ability to run the Data TCK

Question

[Bug]: Improve clean-room ability to run the Data TCK

edburns opened this issue 6 months ago · 4 comments

Specification Version

1.0.0-RC1

Bug report

The current state of the Data TCK could use some improvement regarding the ability of a party creating a clean-room implementation of the spec to run the TCK against their impl. The current state of the data TCK requires a fair amount of a priori knowledge to even run the TCK.

No info about how persistence provider is supposed to be run.
No info about having CDI ready.
Missing info or persistence.xml descriptor.
Unclear if annotation provided needs to be run.

Additional information

No response

Answer 1 · 2024-05-14T16:06:36.000Z

The Jakarta Data specification is written in terms of Jakarta Data providers. Usage of a Jakarta Persistence provider is an implementation detail of a Jakarta Data provider. Some will use a persistence provider, some will not. For example, jnosql using Jakarta NoSQL rather than Jakarta Persistence, or a Jakarta Data provider that uses JDBC directly rather than Jakarta Persistence. Other Jakarta Data providers will be heavily tied to Jakarta Persistence and might require a particular Jakarta Persistence provider or might have the ability to run generally on any. It is even possible that other Jakarta Data providers might be created to interface directly with other types of data stores. That is all up to the Jakarta Data provider implementation. The TCK does not make requirements on this aspect or provide instructions that are specific to a particular implementation choice.

The point about lacking information on CDI will need to be looked into.

Answer 2 · 2024-05-14T21:19:01.000Z

Here's my experience with "unboxing" of the TCK itself:

I took the latest binary (the one with sha 02dda1...) and unzipped it
found the readme in the root and opened the guide found in the docs folder
going through the UG:
- sect 3.1 says there is SE mode - great
- sect 4.1 says EE Platform is required if running in EE Mode - I'm not interested in this, so I don't need the EE server
- sect 5.2
  - artifacts contains The TCK test classes and source - if this is supposed to mean that there is a -tck-sources.jar in the TCK distribution then such jar is missing and either the guide or the zip is wrong
  - just noticed a typo - doc folder vs actual docs folder
  - a script.. let's run it:

➜  artifacts ./artifact-install.sh 
zsh: permission denied: ./artifact-install.sh
➜  artifacts chmod u+x artifact-install.sh 
➜  artifacts ./artifact-install.sh        
A valid version is required to execute this script

what? Why cannot the script by default install files it has next to it? Never mind, let's pass 1.0.0 to the script and move on

sect 5.3.2 - SE Mode - this is what I'm looking for! let's jump to sect 7 and more specifically 7.1
sect 7.1 - the runner described here seems to be available already, so let's try it:

➜  starter mvn clean test -f se-pom.xml
...
[ERROR] 'groupId' with value '[YOUR_GROUP_ID]' does not match a valid id pattern. @ line 23, column 11
...

...interactive experience :-) let's set gId to com.acme and try again

➜  starter mvn clean test -f se-pom.xml
...
[ERROR] Tests run: 85, Failures: 73, Errors: 11, Skipped: 0

cool! There is at least one test which passes and that is the signature test. But wait a sec - the se-pom did run tests from some jakarta.data:jakarta.data-tck:1.0.0-SNAPSHOT and not from the tck jar installed by the artifact-install.sh (apparently not everything from #729 was addressed) and the version of the API was not 1.0.0 but 1.0.0-SNAPSHOT

IDK but I expect that if I have some installer and some example then the example is bound to the artifact versions installed by provided installer and the only -SNAPSHOT I can find is in the runner's version.

...so let's fix the version of dependencies in the pom and try again - same results, but why is there new [path-to-extract-jvm-classes] folder under the starter? It belongs under the target folder, doesn't it? There are also few log files, well, in this case some people prefer to have them in the top level folder, so why not..

Now, when I look at the test failures themselves - all of them are caused by NPE, looking into sources reveals that NPE happens because injection does not happen (ie here) - why the injection is even expected to work in SE mode? What does SE and/or standalone mode within the context of the Data TCK mean? The Data Spec seems to be silent about requiring CDI in non EE products (...but I can be wrong of course)

few other comments:

consider updating maven plugin versions in poms of sample runners (surefire 2.22.2 is a bit outdated today)
consider updating API dependencies to EE 11 versions (ie CDI 4.1.0, 6.1.0,...)
consider satisfying the MUST contain Instructions describing how to run the compatible implementation(s) that are being used to validate the TCK requirement of the TCK Process by inclusion instead of by reference to the OpenLiberty fat tck (ie use jnosql or anything showing that it is possible to pass all defined tests)

Answer 3 · 2024-05-14T21:36:58.000Z

Lots of good points above. Thanks for the analysis. They will all need to be looked into.

A few answers that I can provide initially:

The Data Spec seems to be silent about requiring CDI in non EE products (...but I can be wrong of course)

Some vendors with standalone products wanted to be able to use their own custom dependency injection rather than CDI, so I think it's intentional that the TCK for standalone does not require CDI.

consider satisfying the MUST contain Instructions describing how to run the compatible implementation(s) that are being used to validate the TCK requirement of the TCK Process by inclusion instead of by reference

If we have the option of choosing by inclusion or reference, my vote is to continue using by reference where each vendor that wants to supply a compatible implementation can maintain its own instructions. I'd rather not see vendor-specific things be maintained by the spec project.

Answer 4 · 2024-05-15T12:48:34.000Z

The Data Spec seems to be silent about requiring CDI in non EE products (...but I can be wrong of course)

Some vendors with standalone products wanted to be able to use their own custom dependency injection rather than CDI, so I think it's intentional that the TCK for standalone does not require CDI.

that is reasonable. Clearly the TCK requires DI to happen even in the SE mode but TCK UG is completely silent about it or more specifically about who is expected to provide the DI framework - if vendor, then 5.3.2 looks like a good place to tell him, yet for SE tests, my expectation is that these tests depend on the chosen testing framework only

If we have the option of choosing by inclusion or reference, my vote is to continue using by reference where each vendor that wants to supply a compatible implementation can maintain its own instructions. I'd rather not see vendor-specific things be maintained by the spec project.

the thing is that mentor and/or people voting in the ballot will likely try to execute tests to verify that the TCK itself is runnable. Is it expected these people will checkout openliberty repo to do so? What is the guarantee that the content the reference points to remain unchanged? Say in 3-4 years one wants to reproduce some issue, will the content on the provided URL be exactly the same as it is today?

Usage of a Jakarta Persistence provider is an implementation detail of a Jakarta Data provider. Some will use a persistence provider, some will not.

Is it expected that the Data provider willing to use Persistence provider goes through the TCK sources himself to find all @Entities in order to be able to compose the descriptor for the persistence runtime he wants to support?