aaschmid/gradle-cpd-plugin

CPD finds duplications in test fixtures despite source limitation

C-Otto opened this issue · 15 comments

I configured CPD for my multi-project Gradle 6.5.1 project. Some of these sub-projects make use of the java-test-fixtures plugin, which allows me to put Java files into src/testFixtures/java. Despite limiting the source as shown below, I get a CPD issue for code inside src/testFixtures/java (both files which are part of the issue are inside the same directory).

With --debug I see the tokenize debug message for the files in source/main/java, followed by those in source/testFixtures/java:

2020-07-10T10:29:58.078+0100 [DEBUG] [de.aaschmid.gradle.plugins.cpd.internal.worker.CpdExecutor] Tokenize /home/cotto/IdeaProjects/xxx/yyy/src/main/java/com/aaa/Logs.java
2020-07-10T10:29:58.078+0100 [DEBUG] [de.aaschmid.gradle.plugins.cpd.internal.worker.CpdExecutor] Tokenize /home/cotto/IdeaProjects/xxx/yyy/src/testFixtures/java/com/bbb/DatabaseLockingTestJob.java

However, when I debug the source property in my configuration, everything is fine (i.e. I only see src/main/java files):

files(source).each { File file ->
    println(file)
}

This happens with:

  • source = files('src/main/java')
  • source = sourceSets.main.java
  • source = sourceSets.main.allJava
plugins {
    id 'de.aaschmid.cpd' version '3.1' apply false
}

configure(subprojects.findAll { it.name != 'xxx-platform' }) {
    apply plugin: 'java'
    apply plugin: 'de.aaschmid.cpd'

    check.dependsOn(cpdCheck)

    cpdCheck {
        source = files('src/main/java') // do not run for test code
    }
}

Hm, that sounds indeed very strange @C-Otto. Do you have a small reproducer? Is it urgent (just for my prioritization of things)?

We won't be able to use CPD once we switched to Gradle, but that won't happen for a few weeks or so. Besides, we can live without CPD, although it's nice to have. So... not that urgent. I'll have a look at a reproducible case tomorrow.

https://github.com/C-Otto/cpdtestfixtures

If CPD is configured before the integration tests are introduced, for reasons I don't understand CPD also scans the integration test sources:
C-Otto/cpdtestfixtures@fbf644b

If you enable the java-test-fixtures plugin in aaa/build.gradle CPD starts complaining about duplications in the test fixtures code:
C-Otto/cpdtestfixtures@52db52e

This might be related to gradle/gradle#13781 and the linked issues.

Sounds like a lazy initialization stuff. Thanks for reproducer, I will have a look at it.

Very, very strange ... but finally could find the problem: It appears if and only if the cpdChecks source is set before further sourceSets are lazily configured by other applied plugins...

Here is the full story:

  1. Could reproduce it using your reproducer
  2. Wrote an acceptance test, see https://github.com/aaschmid/gradle-cpd-plugin/blob/%2350.text.fixture/src/integTest/groovy/de/aaschmid/gradle/plugins/cpd/test/CpdAcceptanceTest.groovy#L662-L705. Unfortunately, it was working :-( and playing around hadn't helped me understand it
  3. Added some printlns to your reproducer for the subproject:
    println(files(sourceSets.main.allJava).files)
    println(files(sourceSets.test.allJava).files)
    println(files(sourceSets.testFixtures.allJava).files)
    println("----------------")
    println(cpdCheck.source.files)
    
    They produced the "incorrect" result:
    []
    [/home/schmida/tmp/cpdtestfixtures/aaa/src/test/java/BadTest.java]
    [/home/schmida/tmp/cpdtestfixtures/aaa/src/testFixtures/java/BadTestFixture.java]
    ----------------
    [/home/schmida/tmp/cpdtestfixtures/aaa/src/testFixtures/java/BadTestFixture.java]
    
  4. I restructured the project by moving the application of java-test-fixture to rootProjects build.gradle => suprisingly, now it was working:
    []
    [/home/schmida/tmp/cpdtestfixtures/aaa/src/test/java/BadTest.java]
    [/home/schmida/tmp/cpdtestfixtures/aaa/src/testFixtures/java/BadTestFixture.java]
    ----------------
    []
    
  5. Another possibility to fix this is to apply and configure the cpd plugin in the aaas (= sub project) build.gradle => the problem arises if the java-test-fixture plugin is not applied and configured together with the cpd plugin
  6. So this is indeed a problem with the lazy source initialization of a cpdCheck task :-( (see Code)
  7. => The described problem appears if and only if the cpdChecks source is set before further sourceSets configuring plugins are applied because Cpd tasks lazy source initialization still listens to further applied plugins and their sourceSets.

Example: if cpdCheck is configured on in rootProjects build.gradle for subprojects and later another subprojects closure applies java-test-fixture or java plugin, the cpdChecks source is enhanced by their sourceSets.allJava files. subprojects build.gradle files are also applied separately and later...

Good, that I have found the root cause of this issue. Bad, that I currently have only a quite ugly (gut feeling) hack to solve this. Concrete I check if the source was explicitly set and prevent further changes.

@C-Otto does that information help you already such that you can fix the issue by restructuring your build?
To be honest, I at least think about documenting this behavior and leave it as is.

Another question to you @C-Otto: Also I prefer to apply the cpd' plugin on the rootProject` and configure inclusions / exclusions there. Would this be an alternative or would you like to have duplicates checked only project by project? SonarQube e.g. does check for duplicates also over the whole code base if I remember correctly.

It looks like your IT adds a file with duplications to src/main/java? That doesn't look right to me.

Anyway, for me the hotfix/hack using evaluationDependsOnChildren() also helped here (see linked Gradle issue).
I'm now able to configure CPD inside something like a subprojects block in the root project, and configure the sources correctly as each subproject's plugins were added before this root configuration happens (the subprojects' configuration tasks run before, so the "new" sourcesets are already known when I configure CPD in the root project's configuration task).
Of course I'd appreciate a long-term (non-hacked) solution :) Thanks for the effort!

Running CPD on the whole project would be possible. This doesn't match the existing Maven configuration I'm using as a base, but I think that might be a valid step. However, the root project does not have any source, and I don't know how to configure CPD. If you have any ideas, I'd be happy to experiment further

How to run CPD as part of ./gradlew check? How to add CPD to a subject's check? How to define the sources? I just got this far:

WARNING: Due to the absence of 'LifecycleBasePlugin' on root project 'xxx-parent' the task ':cpdCheck' could not be added to task graph. Therefore CPD will not be executed. To prevent this, manually add a task dependency of ':cpdCheck' to a 'check' task of a subproject.
1) Directly to project ':xxx-subproject':
    check.dependsOn(':cpdCheck')
2) Indirectly, e.g. via root project 'xxx-parent':
    project(':xxx-subproject') {
        plugins.withType(LifecycleBasePlugin) { // <- just required if 'java' plugin is applied within subproject
            check.dependsOn(cpdCheck)
        }
    }

It looks like your IT adds a file with duplications to src/main/java? That doesn't look right to me.
Yes this was for my test that neither test nor testFixtures are included into source and therefore check for duplicates.

Anyway, for me the hotfix/hack using evaluationDependsOnChildren() also helped here (see linked Gradle issue).
evaluationDependsOnChildren seems like a fix, right ;-)

How to run CPD as part of ./gradlew check? How to add CPD to a subject's check? How to define the sources?
That is a know limitation if the rootProject has no lifecycle plugin applied (normally implicit via other plugin). The solution is document on https://github.com/aaschmid/gradle-cpd-plugin#multi-module-project.

Depending on your needs, now java sourcesets of all projects will be automatically checked for duplicates (see Code. If you want to override this behaviour, you have a lot of options but note that the lazy evaluation of plugins applied to later configured subprojects adds further sourceSets again unless you use evaluationDependsOnChildren:

cpdCheck {
    source = subprojects*.sourceSets*.main*.java*.srcDirs
}

or

cpdCheck {
    source = subprojects*.sourceSets*.main*.java*.srcDirs + subprojects*.sourceSets*.test*.java*.srcDirs
}

or

cpdCheck {
    source = []
    subprojects.forEach { project -> it.source(project.sourceSets.main.java) } // only if all subprojects have corresponding `sourceSet`
}

or

cpdCheck {
    source = []
    allprojects.forEach { project ->
        project.plugins.withType(JavaBasePlugin) { plugin ->
            project.sourceSets.main.java.forEach { s -> rootProject.cpdCheck.source(s) }
        }
    }
}

Maybe I should document these as well ;-)

Currently best idea for a proper solutions to this: I could provide an additional property with an allowlist or denylist for sourceSet names, e.g. allowedSourceSets = [ "main", "test" ] or deniedSourceSets = [ "testFixture"] which should (not) automatically be added - defaults to all as currently.

Thanks for the snippets, I'll try them out tomorrow. I agree, a bit more documentation wouldn't hurt, especially related to the warning I posted earlier (I've seen it several times when I tried to get the plugin to work with subprojects).

The additional options also seem useful!

I'm unable to create a configuration that combines the scan for all subprojects, as described in your response above. I'm using this snippet for all variants described below:

evaluationDependsOnChildren()
apply plugin: 'java' // required so that the check task exists?
apply from: 'etc/cpd.gradle'

With this I get a single cpdCheck task, but it finds duplicates in subproject/src/integrationTest:

source = subprojects*.sourceSets*.main*.java*.srcDirs

With this I get the same (duplication in integrationTest):

source = []
subprojects.forEach { project -> it.source(project.sourceSets.main.java) }

Finally, I get

* What went wrong:
A problem occurred evaluating root project 'xxx-parent'.
> Failed to apply plugin class 'org.gradle.api.plugins.JavaBasePlugin'.
   > Could not get unknown property 'main' for SourceSet container of type org.gradle.api.internal.tasks.DefaultSourceSetContainer.

for

source = []
allprojects.forEach { project ->
    project.plugins.withType(JavaBasePlugin) { plugin ->
        project.sourceSets.main.java.forEach { s -> rootProject.cpdCheck.source(s) }
    }
}
  1. the java plugin is not required for the CPD plugin to work but it prints a warning if you haven't manually add it to one of the childrens check task if you have no JavaBasePlugin on the root project, see e.g. https://github.com/TNG/junit-dataprovider/blob/master/build.gradle.kts#L357

  2. Hm ... I think the too many sourcesets are still caused by the lazy evalutation logic even if you are using the evalationDependsOnChildren(). Would you mind to adjust your reproducer above? I have not that much time at the moment but would try to fix it if it is urgent to you...

  3. Your exception is also odd as the plugins.withType (see here) should not cause errors if no plugin matches. JavaBasePlugin (see here) should have sourceSets already but maybe they changed that java plugin creates the main sourcesets, see here. And you apply java plugin ...

Sorry, I'm confused about the current state. Let's ignore the "global scan", i.e. using CPD to find duplicates between subprojects.

My understanding:
This is a bug in your code, where my explicit configuration is overwritten due to lazy initalisation. The configuration order plays a crucial role here, which is why evaluationDependsOnChildren() works around this issue. Despite my assumptions, the java-test-fixtures plugin does not do anything that contributes to this bug (especially this bug: gradle/gradle#13781).

Is this correct?

@C-Otto: sorry for the inactivity on this issue but my spare is quite limited at the moment and I honestly higher prioritize other things.

I deeply need to think about the lazy initialization here but still not sure if this is really a bug...

As long as CPD does not stick to the configuration, I'd like to think it's a bug. This could be a bug in the code that reads and applies the configuration, or something in the documentation. Either way, it does not work, and I have no way of making it work.

I will try to figure out a solution as soon as possible using your reproducer from above. However, you mentioned integrationTest source sets above. Is this use case already part of the reproducer or could you kindly add it?

I believe the issue is related to the existence of several source sets, and the name/purpose of these is not relevant. The reproducer contains two commits, where the initial commit exposes the bug with integration tests.