CPD finds duplications in test fixtures despite source limitation
C-Otto opened this issue · 15 comments
I configured CPD for my multi-project Gradle 6.5.1 project. Some of these sub-projects make use of the java-test-fixtures plugin, which allows me to put Java files into src/testFixtures/java
. Despite limiting the source
as shown below, I get a CPD issue for code inside src/testFixtures/java
(both files which are part of the issue are inside the same directory).
With --debug
I see the tokenize
debug message for the files in source/main/java
, followed by those in source/testFixtures/java
:
2020-07-10T10:29:58.078+0100 [DEBUG] [de.aaschmid.gradle.plugins.cpd.internal.worker.CpdExecutor] Tokenize /home/cotto/IdeaProjects/xxx/yyy/src/main/java/com/aaa/Logs.java
2020-07-10T10:29:58.078+0100 [DEBUG] [de.aaschmid.gradle.plugins.cpd.internal.worker.CpdExecutor] Tokenize /home/cotto/IdeaProjects/xxx/yyy/src/testFixtures/java/com/bbb/DatabaseLockingTestJob.java
However, when I debug the source
property in my configuration, everything is fine (i.e. I only see src/main/java
files):
files(source).each { File file ->
println(file)
}
This happens with:
source = files('src/main/java')
source = sourceSets.main.java
source = sourceSets.main.allJava
plugins {
id 'de.aaschmid.cpd' version '3.1' apply false
}
configure(subprojects.findAll { it.name != 'xxx-platform' }) {
apply plugin: 'java'
apply plugin: 'de.aaschmid.cpd'
check.dependsOn(cpdCheck)
cpdCheck {
source = files('src/main/java') // do not run for test code
}
}
Hm, that sounds indeed very strange @C-Otto. Do you have a small reproducer? Is it urgent (just for my prioritization of things)?
We won't be able to use CPD once we switched to Gradle, but that won't happen for a few weeks or so. Besides, we can live without CPD, although it's nice to have. So... not that urgent. I'll have a look at a reproducible case tomorrow.
https://github.com/C-Otto/cpdtestfixtures
If CPD is configured before the integration tests are introduced, for reasons I don't understand CPD also scans the integration test sources:
C-Otto/cpdtestfixtures@fbf644b
If you enable the java-test-fixtures plugin in aaa/build.gradle
CPD starts complaining about duplications in the test fixtures code:
C-Otto/cpdtestfixtures@52db52e
This might be related to gradle/gradle#13781 and the linked issues.
Sounds like a lazy initialization stuff. Thanks for reproducer, I will have a look at it.
Very, very strange ... but finally could find the problem: It appears if and only if the cpdCheck
s source
is set before further sourceSets
are lazily configured by other applied plugins...
Here is the full story:
- Could reproduce it using your reproducer
- Wrote an acceptance test, see https://github.com/aaschmid/gradle-cpd-plugin/blob/%2350.text.fixture/src/integTest/groovy/de/aaschmid/gradle/plugins/cpd/test/CpdAcceptanceTest.groovy#L662-L705. Unfortunately, it was working :-( and playing around hadn't helped me understand it
- Added some
println
s to your reproducer for the subproject:They produced the "incorrect" result:println(files(sourceSets.main.allJava).files) println(files(sourceSets.test.allJava).files) println(files(sourceSets.testFixtures.allJava).files) println("----------------") println(cpdCheck.source.files)
[] [/home/schmida/tmp/cpdtestfixtures/aaa/src/test/java/BadTest.java] [/home/schmida/tmp/cpdtestfixtures/aaa/src/testFixtures/java/BadTestFixture.java] ---------------- [/home/schmida/tmp/cpdtestfixtures/aaa/src/testFixtures/java/BadTestFixture.java]
- I restructured the project by moving the application of
java-test-fixture
torootProject
sbuild.gradle
=> suprisingly, now it was working:[] [/home/schmida/tmp/cpdtestfixtures/aaa/src/test/java/BadTest.java] [/home/schmida/tmp/cpdtestfixtures/aaa/src/testFixtures/java/BadTestFixture.java] ---------------- []
- Another possibility to fix this is to apply and configure the cpd plugin in the
aaa
s (= sub project)build.gradle
=> the problem arises if thejava-test-fixture
plugin is not applied and configured together with thecpd
plugin - So this is indeed a problem with the lazy
source
initialization of acpdCheck
task :-( (see Code) - => The described problem appears if and only if the
cpdCheck
ssource
is set before furthersourceSets
configuring plugins are applied becauseCpd
tasks lazysource
initialization still listens to further applied plugins and theirsourceSets
.
Example: if cpdCheck
is configured on in rootProject
s build.gradle
for subprojects
and later another subprojects
closure applies java-test-fixture
or java
plugin, the cpdCheck
s source
is enhanced by their sourceSets.allJava
files. subprojects
build.gradle
files are also applied separately and later...
Good, that I have found the root cause of this issue. Bad, that I currently have only a quite ugly (gut feeling) hack to solve this. Concrete I check if the source was explicitly set and prevent further changes.
@C-Otto does that information help you already such that you can fix the issue by restructuring your build?
To be honest, I at least think about documenting this behavior and leave it as is.
Another question to you @C-Otto: Also I prefer to apply the cpd' plugin on the
rootProject` and configure inclusions / exclusions there. Would this be an alternative or would you like to have duplicates checked only project by project? SonarQube e.g. does check for duplicates also over the whole code base if I remember correctly.
It looks like your IT adds a file with duplications to src/main/java
? That doesn't look right to me.
Anyway, for me the hotfix/hack using evaluationDependsOnChildren()
also helped here (see linked Gradle issue).
I'm now able to configure CPD inside something like a subprojects
block in the root project, and configure the sources correctly as each subproject's plugins were added before this root configuration happens (the subprojects' configuration tasks run before, so the "new" sourcesets are already known when I configure CPD in the root project's configuration task).
Of course I'd appreciate a long-term (non-hacked) solution :) Thanks for the effort!
Running CPD on the whole project would be possible. This doesn't match the existing Maven configuration I'm using as a base, but I think that might be a valid step. However, the root project does not have any source, and I don't know how to configure CPD. If you have any ideas, I'd be happy to experiment further
How to run CPD as part of ./gradlew check
? How to add CPD to a subject's check
? How to define the sources? I just got this far:
WARNING: Due to the absence of 'LifecycleBasePlugin' on root project 'xxx-parent' the task ':cpdCheck' could not be added to task graph. Therefore CPD will not be executed. To prevent this, manually add a task dependency of ':cpdCheck' to a 'check' task of a subproject.
1) Directly to project ':xxx-subproject':
check.dependsOn(':cpdCheck')
2) Indirectly, e.g. via root project 'xxx-parent':
project(':xxx-subproject') {
plugins.withType(LifecycleBasePlugin) { // <- just required if 'java' plugin is applied within subproject
check.dependsOn(cpdCheck)
}
}
It looks like your IT adds a file with duplications to
src/main/java
? That doesn't look right to me.
Yes this was for my test that neithertest
nortestFixtures
are included intosource
and therefore check for duplicates.
Anyway, for me the hotfix/hack using evaluationDependsOnChildren() also helped here (see linked Gradle issue).
evaluationDependsOnChildren
seems like a fix, right ;-)
How to run CPD as part of ./gradlew check? How to add CPD to a subject's check? How to define the sources?
That is a know limitation if therootProject
has nolifecycle
plugin applied (normally implicit via other plugin). The solution is document on https://github.com/aaschmid/gradle-cpd-plugin#multi-module-project.
Depending on your needs, now java sourcesets of all projects will be automatically checked for duplicates (see Code. If you want to override this behaviour, you have a lot of options but note that the lazy evaluation of plugins applied to later configured subprojects adds further sourceSets again unless you use evaluationDependsOnChildren
:
cpdCheck {
source = subprojects*.sourceSets*.main*.java*.srcDirs
}
or
cpdCheck {
source = subprojects*.sourceSets*.main*.java*.srcDirs + subprojects*.sourceSets*.test*.java*.srcDirs
}
or
cpdCheck {
source = []
subprojects.forEach { project -> it.source(project.sourceSets.main.java) } // only if all subprojects have corresponding `sourceSet`
}
or
cpdCheck {
source = []
allprojects.forEach { project ->
project.plugins.withType(JavaBasePlugin) { plugin ->
project.sourceSets.main.java.forEach { s -> rootProject.cpdCheck.source(s) }
}
}
}
Maybe I should document these as well ;-)
Currently best idea for a proper solutions to this: I could provide an additional property with an allowlist or denylist for sourceSet
names, e.g. allowedSourceSets = [ "main", "test" ]
or deniedSourceSets = [ "testFixture"]
which should (not) automatically be added - defaults to all as currently.
Thanks for the snippets, I'll try them out tomorrow. I agree, a bit more documentation wouldn't hurt, especially related to the warning I posted earlier (I've seen it several times when I tried to get the plugin to work with subprojects).
The additional options also seem useful!
I'm unable to create a configuration that combines the scan for all subprojects, as described in your response above. I'm using this snippet for all variants described below:
evaluationDependsOnChildren()
apply plugin: 'java' // required so that the check task exists?
apply from: 'etc/cpd.gradle'
With this I get a single cpdCheck
task, but it finds duplicates in subproject/src/integrationTest
:
source = subprojects*.sourceSets*.main*.java*.srcDirs
With this I get the same (duplication in integrationTest
):
source = []
subprojects.forEach { project -> it.source(project.sourceSets.main.java) }
Finally, I get
* What went wrong:
A problem occurred evaluating root project 'xxx-parent'.
> Failed to apply plugin class 'org.gradle.api.plugins.JavaBasePlugin'.
> Could not get unknown property 'main' for SourceSet container of type org.gradle.api.internal.tasks.DefaultSourceSetContainer.
for
source = []
allprojects.forEach { project ->
project.plugins.withType(JavaBasePlugin) { plugin ->
project.sourceSets.main.java.forEach { s -> rootProject.cpdCheck.source(s) }
}
}
-
the
java
plugin is not required for the CPD plugin to work but it prints a warning if you haven't manually add it to one of the childrenscheck
task if you have noJavaBasePlugin
on the root project, see e.g. https://github.com/TNG/junit-dataprovider/blob/master/build.gradle.kts#L357 -
Hm ... I think the too many sourcesets are still caused by the lazy evalutation logic even if you are using the
evalationDependsOnChildren()
. Would you mind to adjust your reproducer above? I have not that much time at the moment but would try to fix it if it is urgent to you... -
Your exception is also odd as the
plugins.withType
(see here) should not cause errors if no plugin matches.JavaBasePlugin
(see here) should havesourceSets
already but maybe they changed thatjava
plugin creates themain
sourcesets, see here. And you applyjava
plugin ...
Sorry, I'm confused about the current state. Let's ignore the "global scan", i.e. using CPD to find duplicates between subprojects.
My understanding:
This is a bug in your code, where my explicit configuration is overwritten due to lazy initalisation. The configuration order plays a crucial role here, which is why evaluationDependsOnChildren()
works around this issue. Despite my assumptions, the java-test-fixtures
plugin does not do anything that contributes to this bug (especially this bug: gradle/gradle#13781).
Is this correct?
@C-Otto: sorry for the inactivity on this issue but my spare is quite limited at the moment and I honestly higher prioritize other things.
I deeply need to think about the lazy initialization here but still not sure if this is really a bug...
As long as CPD does not stick to the configuration, I'd like to think it's a bug. This could be a bug in the code that reads and applies the configuration, or something in the documentation. Either way, it does not work, and I have no way of making it work.
I will try to figure out a solution as soon as possible using your reproducer from above. However, you mentioned integrationTest
source sets above. Is this use case already part of the reproducer or could you kindly add it?
I believe the issue is related to the existence of several source sets, and the name/purpose of these is not relevant. The reproducer contains two commits, where the initial commit exposes the bug with integration tests.