JetBrains-Research/PyNose

Data Clumps Support?

Opened this issue · 12 comments

Hi there,

currently in my PhD research I am facing data clumps to be refactored. Maybe you will consider this code smell too, as it is also a design smell.

I am currently developing a IntelliJ IDEA plugin.
I am facing the problem to load all dependencies and getting the qualified name of a class.

areyde commented

Dear Nils! Nice to see you here!

I have seen your paper on ENASE, though unfortunately, I could not extract the full text even with institutional access. Moreover, if you caught it, just a couple of days ago I sent you an email inviting you to the IDE workshop that we are organizing at ICSE — about plugins, too! So we heard of your work!

Right now our plugin is not really developed in any way. But if I understood you correctly, you are asking for some general help with the IntelliJ Platform? If that is the case, could you please elaborate on your question? We could try to help you with advice.

Hi Yaroslav,

nice to see you here too :-) I have sent you an email with the full text.
So your email achieved the goal as I wanted to transpose our current results over to an IntelliJ plugin again.
This time it is a dual setup, but I am struggeling at getting super classes.

We could try to help you with advice.
That would be awesome 👍

I think that is because my setup does not have all required libraries or paths defined.
Here is my current project: https://github.com/NilsBaumgartner1994/REDCLIFF-Java

How to reproduce the steps:

  1. Have a small Java Project to be analyzed.
  2. Set $FOLDER with the path to the project to be analyzed
  3. run ./build_image.sh && docker-compose up

The problem shows up when i want to get the super classes:

PsiClass[] supers = currentClass.getSupers();
// supers is an empty list

I think I loaded the project, waited the refresh to be done and waited for indexing programmatically. My next steps to check where the problem comes from is to run it not in the docker but on my own host machine.

But i think the main problem lies in my Dockerfile

FROM ubuntu:jammy-20230624

WORKDIR /app

# Install necessary packages and Java 17
RUN apt-get update \
  && apt-get install -y unzip wget libfreetype6 fontconfig openjdk-17-jdk \
  && rm -rf /var/lib/apt/lists/*

# Set JAVA_HOME environment variable
ENV JAVA_HOME /usr/lib/jvm/java-17-openjdk-amd64

# Download and extract IntelliJ IDEA
RUN wget https://download.jetbrains.com/idea/ideaIC-2023.2.tar.gz \
    && tar -xzf ideaIC-2023.2.tar.gz \
    && rm ideaIC-2023.2.tar.gz \
    && mv idea-* idea

# Set the JAVA_HOME for IntelliJ IDEA
RUN echo "idea.jdk=$JAVA_HOME" >> /app/idea/bin/idea.properties

# Create Maven repository directory
RUN mkdir -p /root/.m2/repository

# Set the working directory to /app/idea for plugin operations
WORKDIR /app/idea

# Download and install the Gradle plugin for IntelliJ IDEA
RUN wget -O gradle-intellij-plugin.zip "https://plugins.jetbrains.com/plugin/download?rel=true&updateId=410281" \
    && mkdir -p plugins/gradle-intellij-plugin \
    && unzip gradle-intellij-plugin.zip -d plugins/gradle-intellij-plugin \
    && rm gradle-intellij-plugin.zip

# Copy your custom plugin and formatter
COPY build/distributions/formatter-plugin plugins/formatter-plugin
COPY formatter /usr/bin/formatter

# Set the working directory back to /data
WORKDIR /data

ENTRYPOINT ["formatter"]

onewhl commented

Hi @NilsBaumgartner1994!
I'm Zarina, Yaroslav's colleague, and I would be happy to help you with the plugin!

Could you please tell me if you tested the plugin outside the docker? There is a gradle task called runIde that allows you to run the plugin in a separate IntelliJ IDEA instance. As I understood, you want to run the plugin in the terminal without launching UI, it's possible and it's called headless mode. You can find the example of such a use case at the following link. For that purpose, we extend runIde task in the build.gradle.kts script and set the JVM argument -Djava.awt.headless=true. Here you can find an example.

The first thing I would recommend you try is to make your tool run in the headless mode locally outside of Docker, And when it works fine, then get back to the Docker setup.

Hi @onewhl,

Nice to meet you. I will have a look at the workshop demo. Unfortunately i haven't found it at the beginning since it might look almost what I need to start with :-)

I will come back later after I tried that workshop demo currently to adapt.

onewhl commented

@NilsBaumgartner1994 cool! And here is a short paper about the ways you can you the IntelliJ platform: https://arxiv.org/pdf/2110.00141.pdf.

areyde commented

Also, the original version of this plugin (in the ASE branch) also worked in headless mode.

@onewhl okay i can run the plugin and also extract the classes and the qualified names correctly. Since i am currently not that familiar with Kotlin I might think to switch to Java.

But this workshop demo works great. It will help me to get a step further into automatic data clumps elimination :-)

@onewhl have you thought about a dockerfile?

I copied the demo workshop into my project: https://github.com/NilsBaumgartner1994/REDCLIFF-Java

And then added a Dockerfile and a docker-compose.yaml file

Dockerfile

FROM ubuntu:20.04

WORKDIR /app

# Install necessary packages and Java 17
RUN apt-get update \
  && apt-get install -y unzip wget libfreetype6 fontconfig openjdk-17-jdk \
  && rm -rf /var/lib/apt/lists/*

# Set JAVA_HOME environment variable
ENV JAVA_HOME /usr/lib/jvm/java-17-openjdk-amd64

# Download and extract IntelliJ IDEA
RUN wget https://download.jetbrains.com/idea/ideaIC-2023.2.tar.gz \
    && tar -xzf ideaIC-2023.2.tar.gz \
    && rm ideaIC-2023.2.tar.gz \
    && mv idea-* idea

# Set the JAVA_HOME for IntelliJ IDEA
RUN echo "idea.jdk=$JAVA_HOME" >> /app/idea/bin/idea.properties

# Create Maven repository directory
RUN mkdir -p /root/.m2/repository

# Set the working directory to /app/idea for plugin operations
WORKDIR /app/idea

# Download and install the Gradle plugin for IntelliJ IDEA
RUN wget -O gradle-intellij-plugin.zip "https://plugins.jetbrains.com/plugin/download?rel=true&updateId=410281" \
    && mkdir -p plugins/gradle-intellij-plugin \
    && unzip gradle-intellij-plugin.zip -d plugins/gradle-intellij-plugin \
    && rm gradle-intellij-plugin.zip

# Set the working directory back to /data
WORKDIR /app

docker-compose-yaml:

version: '3'

services:
  idea-parser:
    mem_limit: 12g
    stdin_open: true
    tty: true
    image: nilsbaumgartner1994/idea-parser
    environment:
      - GRADLE_USER_HOME=/gradle_cache
    volumes:
      - ./:/app
      - ${FOLDER:-/Users/nbaumgartner/Desktop/TestProject}:/data
      - ${OUTPUT:-./Result}:/output
      - ./gradle_cache:/gradle_cache  # This maps the gradle_cache directory on your host to /gradle_cache in the container
    command:
      - "./runDemoCLI.sh"
      - "/data"
      - "/output"

and my build_image.sh

#!/bin/bash -e

rm -rf build/distributions

./gradlew :demo-plugin:buildPlugin

# Make sure to pull latest image before building new ones to reuse cache
# docker pull nilsbaumgartner1994/idea-parser
docker build . -t nilsbaumgartner1994/idea-parser --progress=plain --cache-from nilsbaumgartner1994/idea-parser

Unfortunately the script hangs at the gradle now:

idea-parser_1  | > Configure project :demo-core
idea-parser_1  | Evaluating project ':demo-core' using build file '/app/demo-core/build.gradle'.
idea-parser_1  | [gradle-intellij-plugin :demo-core demo-core] Configuring tests tasks
idea-parser_1  |
idea-parser_1  | > Configure project :demo-plugin
idea-parser_1  | Evaluating project ':demo-plugin' using build file '/app/demo-plugin/build.gradle.kts'.
idea-parser_1  | Caching disabled for Kotlin DSL script compilation (Project/TopLevel/stage1) because:
idea-parser_1  |   Build cache is disabled
idea-parser_1  | Skipping Kotlin DSL script compilation (Project/TopLevel/stage1) as it is up-to-date.
idea-parser_1  | Caching disabled for Kotlin DSL accessors for project ':demo-plugin' because:
idea-parser_1  |   Build cache is disabled
idea-parser_1  | Skipping Kotlin DSL accessors for project ':demo-plugin' as it is up-to-date.
idea-parser_1  | Caching disabled for Kotlin DSL script compilation (Project/TopLevel/stage2) because:
idea-parser_1  |   Build cache is disabled
idea-parser_1  | Skipping Kotlin DSL script compilation (Project/TopLevel/stage2) as it is up-to-date.
idea-parser_1  | [gradle-intellij-plugin :demo-plugin demo-plugin] Configuring tests tasks
idea-parser_1  | All projects evaluated.
idea-parser_1  | Task name matched 'runDemoPluginCLI'
idea-parser_1  | Selected primary task 'runDemoPluginCLI' from project :
idea-parser_1  | [gradle-intellij-plugin] Resolving Gradle IntelliJ Plugin version with: jar:file:/gradle_cache/caches/jars-9/c8f552191b71d4147d9cfabeed095d15/gradle-intellij-plugin-1.13.2.jar!/META-INF/MANIFEST.MF
idea-parser_1  | [gradle-intellij-plugin] Resolving Gradle IntelliJ Plugin version with: jar:file:/gradle_cache/caches/jars-9/c8f552191b71d4147d9cfabeed095d15/gradle-intellij-plugin-1.13.2.jar!/META-INF/MANIFEST.MF
idea-parser_1  | [gradle-intellij-plugin] Resolving Gradle IntelliJ Plugin version with: jar:file:/gradle_cache/caches/jars-9/c8f552191b71d4147d9cfabeed095d15/gradle-intellij-plugin-1.13.2.jar!/META-INF/MANIFEST.MF
idea-parser_1  | [gradle-intellij-plugin] Resolving Gradle IntelliJ Plugin version with: jar:file:/gradle_cache/caches/jars-9/c8f552191b71d4147d9cfabeed095d15/gradle-intellij-plugin-1.13.2.jar!/META-INF/MANIFEST.MF
idea-parser_1  | [gradle-intellij-plugin] Resolving Gradle IntelliJ Plugin version with: jar:file:/gradle_cache/caches/jars-9/c8f552191b71d4147d9cfabeed095d15/gradle-intellij-plugin-1.13.2.jar!/META-INF/MANIFEST.MF
idea-parser_1  | [gradle-intellij-plugin] Resolving Gradle IntelliJ Plugin version with: jar:file:/gradle_cache/caches/jars-9/c8f552191b71d4147d9cfabeed095d15/gradle-intellij-plugin-1.13.2.jar!/META-INF/MANIFEST.MF
idea-parser_1  | [gradle-intellij-plugin] Resolving Gradle IntelliJ Plugin version with: jar:file:/gradle_cache/caches/jars-9/c8f552191b71d4147d9cfabeed095d15/gradle-intellij-plugin-1.13.2.jar!/META-INF/MANIFEST.MF
idea-parser_1  | [gradle-intellij-plugin] Resolving Gradle IntelliJ Plugin version with: jar:file:/gradle_cache/caches/jars-9/c8f552191b71d4147d9cfabeed095d15/gradle-intellij-plugin-1.13.2.jar!/META-INF/MANIFEST.MF
idea-parser_1  | [gradle-intellij-plugin] Resolving Gradle IntelliJ Plugin version with: jar:file:/gradle_cache/caches/jars-9/c8f552191b71d4147d9cfabeed095d15/gradle-intellij-plugin-1.13.2.jar!/META-INF/MANIFEST.MF
idea-parser_1  | [gradle-intellij-plugin] Resolving Gradle IntelliJ Plugin version with: jar:file:/gradle_cache/caches/jars-9/c8f552191b71d4147d9cfabeed095d15/gradle-intellij-plugin-1.13.2.jar!/META-INF/MANIFEST.MF
idea-parser_1  | [gradle-intellij-plugin] Resolving Gradle IntelliJ Plugin version with: jar:file:/gradle_cache/caches/jars-9/c8f552191b71d4147d9cfabeed095d15/gradle-intellij-plugin-1.13.2.jar!/META-INF/MANIFEST.MF
idea-parser_1  | [gradle-intellij-plugin] Resolving Gradle IntelliJ Plugin version with: jar:file:/gradle_cache/caches/jars-9/c8f552191b71d4147d9cfabeed095d15/gradle-intellij-plugin-1.13.2.jar!/META-INF/MANIFEST.MF
idea-parser_1  | [gradle-intellij-plugin :demo-cli demo-cli] Using IDE from remote repository
onewhl commented

@NilsBaumgartner1994 Could you please tell me why you replaced your project with the refactoring-workshop-demo project? I mean all the code you had previously just disappeared. refactoring-workshop-demo is just a repository containing several examples that demonstrate different use cases of using the IntelliJ Platform, you don't need to have all this code in your project, it's just a repository with examples. You can take a look at the examples, learn some information about using the IntelliJ Platform from it, and implement your own plugin in your repository.

Actually, I've never tried to launch IntelliJ IDEA in Docker, but I've just asked my colleagues about it, I will get back to this issue on Monday. What is the starting point of your plugin? What does it take as input? What is expected behavior? I don't see any errors in the Docker logs. Please, test your plugin locally first and when it works fine, then try to run it in Docker. It's much easier to understand what's wrong when you run the tool locally rather than in Docker.

TL;TR:
The plugin just needs to get AST Information about classes and their hierarchy (Qualified names, Attributes, Methods)

Long:
Yes so the main problem with my previous code was, that it did not resolved dependencies correctly. This was critical due to the fact, that my plugin shall identify data clumps.

I will start with the refactoring-workshop-demo as it can resolve all the dependencies. I just think that fixing my problems in my previous project would take more time than to transfer my code to the refactoring-workshop-demo in which i will remove the unused examples.

You can still see my previous project here at: https://github.com/NilsBaumgartner1994/REDCLIFF-Java/tree/f3bb910bfff86aec1ac07490dbca3f958ee98898
Maybe we can elaborate why it is not working.

For an easier transfer of my previous code i will now look how to use Java instead of Kotlin.

Actually, I've never tried to launch IntelliJ IDEA in Docker, but I've just asked my colleagues about it, I will get back to this issue on Monday.

I have now a running and functional Dockerfile to run the CLI Plugin: https://github.com/NilsBaumgartner1994/REDCLIFF-Java/tree/2a33eeeeccddf397df8aeb762473959a62b77e21

Please, test your plugin locally first and when it works fine, then try to run it in Docker. It's much easier to understand what's wrong when you run the tool locally rather than in Docker.

It seemed that the process hanged but it just took long for the first time, loading all dependencies (~30 minutes).
Re-running the docker with caching takes now only 50 seconds.

What is the starting point of your plugin?
My goal is to automatically refactor data clumps. As in my previous work we already have a semi-automatic refactoring plugin. But there is a big problem when facing data clumps.

Problem: There are a lot of data clumps.

The big picture: Meet REDCLIFF (Refactoring Data Clumps Innovative Flexible Framework)
In our approach we will use an IntelliJ Plugin to:

  • Automatically load Projects (Gradle, Maven, Ant)
  • Extract all needed information into an AST, which includes hierarchy, so we know which classes have a common hierarchy
  • This AST will then be used into our Stand-Alone Tool which analyses Data Clumps. This tool may also be used to detect Data Clumps in UML Class Diagrams.
  • After identifying the Data Clumps we will need to prioritize them. For this case we might use AI tools like ChatGPT + some statistics. For ChatGPT or similar we might provide information like last change of files, names, project description ...
    • Not all Data Clumps seem to be bad, some might help the code quality
  • After finding one of the most important Data Clumps we will then use a 2nd IntelliJ plugin. This plugin will get the information which Data Clumps we want to refactor
  • After the successfull refactoring we will push a Merge-Request to the original git repository
  • We will automatically keep track if the Merge-Request is accepted or rejected to learn from that
  • This way we try to achieve an autonomous data clump refactoring

@onewhl How can i switch from Kotlin to Java? Is that easy possible?

onewhl commented

@NilsBaumgartner1994 do you want to make your plugin analyze Java code? Or do you want to migrate this CLI example from Kotlin to Java? The example from refactoring-workshop-demo is written in Kotlin and analyzes Java code. If you want to migrate the code from Kotlin to Java, of course, it's possible, there are many IntelliJ IDEA plugins written in Java. :)
Basically, to write a plugin working in the headless mode, you need three files:

  1. Build script where you define all dependencies and gradle task for launching the plugin in the headless mode.
  2. plugin.xml file in src/main/resources/META-INF /plugin.xml folder. Here you should define dependencies on specific IntelliJ Platform modules, for example, for work with Java code you need to add these ones:
    <depends>com.intellij.java</depends>
    <depends>com.intellij.modules.platform</depends>
    <depends>com.intellij.modules.lang</depends>
    <depends>com.intellij.modules.java</depends>

Also, you need to define the application starter class in extenstions block.
3. Implement an application starter class that will analyze code and extract things you need.

The documentation is there to help, and I'm here too if you need any guidance✨