This is an Eclipse plug-in that allows running JDeodorant for identifying refactoring opportunities and applying them in the batch mode.
Running the headless mode within Eclipse
You can run this application from within Eclipse. Please follow these steps:
-
Download (or clone) jdeodorant-commandline and JDeodorant plug-in and import them as existing projects into your Eclipse workspace.
-
Right-click on the JDeodorant-Commandline project and select Run As > Run Configurations...
-
Click on Eclipse Application and then on the New launch configuration button. Give a name to the newly-created launch configuration.
-
In the Main tab:
-
In the Workspace Data, setup the Location to point to the workspace containing the projects that you want to analyze in the headless mode. The projects which are going to be analyzed will be opened in this workspace. There are two options to open Java projects (that you are going to analyze) in the workspace:
- The workspace directory is created by Eclipse. In this case, it can be created by clicking on File > Switch Workspace and specifying a new workspace directory, and then creating a new project (or importing the existing one) to Eclipse. You can import multiple projects that you want to analyze. After you are done, you should switch back to the original workspace where JDeodorant and jdeodorant-commandline plug-ins are imported.
- You can ask the tool to try importing an existing Eclipse project automatically. In this case, the workspace is created in the given path by the tool, and the project is imported to it. You'll need to use the
-pd
switch to specify the path to the.project
file of the project (See the table below).
Note that, in any case, Eclipse project files should exist for the Java project that you want to analyze.
-
In the Program to Run select to Run an application and from the drop-down list select
ca.concordia.jdeodorant.eclipse.commandline.application
.
-
-
In the Arguments tab specify the Program arguments (refer to the following table).
-
Next, specify the VM arguments as
-Xms128m -Xmx4096m -XX:PermSize=128m
(you can increase the Xmx value, if more memory is available). -
In the Plug-ins tab first select plug-ins selected below only in the Launch with: drop-down list. Then select
ca.concordia.jdeodorant.eclipse.commandline (1.0.0.qualifier)
and click on Add Required Plug-ins button. -
Apply the changes in order to save the new Launch Configuration. Click Run to test whether the headless plug-in works properly. If you are getting
BundleException
s, go back to the Plug-ins tab (step 7) and select Launch with: all workspace and enabled target plug-ins. Apply the changes and Run again the headless plug-in.
Running as a standalone command-line application
We have provided the necessary means for generating an Eclipse product that can be run from the OS command-line as a standalone executable, without the need for opening Eclipse for running. This is particularly useful if, for instance, one needs to integrate JDeodorant in their current development workflow (e.g., using continuous integration).
The Eclipse product is an executable file along with the necessary plug-in dependencies. The entire package can be generated by Eclipse, one for each platform. We have tested the product on Windows and Mac.
To generate the executable for your target platform, follow these steps:
-
Download (or clone) jdeodorant-commandline and JDeodorant plug-in and import them as existing projects into your Eclipse workspace (You will need Eclipse only to generate the Eclipse product, which runs from the OS commandline).
-
In the commandline project, double click on
ProductConfiguration.product
. The Product Configuration Editor should be opened (If not, you might be missing necessary plug-ins installed on your Eclipse. We tested on Eclipse IDE for Java EE Developers). -
If you need to configure the generated product, you can use the Configuration and Launching tabs, which allow changing parameters for the generated product for different platforms. For instance, you might want to change the
eclipse.ini
file that the target product will use, or provide additional VM arguments. -
From the Overview tab, under the Exporting section, choose Eclipse Product Export Wizard.
-
In the shown wizard,
/JDeodorant-Commandline/ProductConfiguration.product
should be selected as Configuration. Specify a directory under the Destination section, and click Finish. -
A folder containing the final Eclipse Product will be created. Look for the file
\eclipse\eclipse.exe
or\MacOS\eclipse
, which is the executable for the product. -
Open a command line, switch to the folder containing the executable file (found in the previous step) and run the product's executable. You should provide necessary arguments, as mentioned in the following table. For instance, you can run (on Windows):
eclipse.exe -pd "TestProject/.project" -x "clones.xls" -m PARSE_AND_ANALYZE ...
Command-line arguments
These arguments can be passed in step 5 (headless mode within Eclipse) or step 7 (standalone mode).
Long option | Short option | Arguments | Description |
---|---|---|---|
--help | -? | Displays arguments and their explanations | |
--mode | -m |
analyze_existing parse_and_analyze parse
|
Mode of operation. See below for more information |
--project | -p | {project name} | Name of the project which currently exists in the Eclipse workspace |
--project-description | -pd | {.project file} | Alternative to `-p`; Path to the `.project` file of the eclipse project to be imported to the workspace |
--excelfile | -x | {path/to/the/xls/file} | Path to the input (output, in the PARSE mode) .xls file |
--tool | -t |
clone_tool_ccfinder clone_tool_clonedr clone_cool_conqat clone_tool_deckard clone_tool_nicad
|
Specifies the clone detection tool |
--tooloutputfile | -i | {path/to/the/input/file} | Path to the main output file of the clone detection tool |
--extra-args | -xargs | {arg1, arg2, ...} | Comma separated list of extra arguments which are needed in case if we use specific clone detection tools. See below for more information. |
--row-start-from | -r | {row} | Specifies the row number (starting from 2, row 1 is the header) of which the tool must start the analysis. |
--append-results | -a | Specifies whether the existing outputs (Excel file, CSV files) must be appended by new results or they must be overridden. | |
--skip-groups | -s | {group_id1, group_id2, ...} | A comma separated list of clone group IDs to be skipped from the analysis. |
--test-packages | -testpkgs | {group_id1, group_id2, ...} | A comma separated list of the fully-qualified names of the packages containing test code. |
--test-source-folders | -testsrcs | {folder1,folder2,...} | A comma separated list of the source folder names containing test code. This is similar to the previous argument. |
--run-tests | -rt | Run tests after applying each refactoring. | |
--log-to-file | -l | Create a log file from console output. | |
--group-ids | -g | {id1, id2, id3, ...} | A comma-separated list of clone group IDs to be analyzed. Other clone groups in the file will be skipped |
--debugging-enabled | -de | Prevent Eclipse command-line tool to cancel jobs queued in Eclipse JobManager such as workbench job, etc., so that debugging is possible in Eclipse | |
--mail-server-ip | -msrvr |
{Mail server address} 127.0.0.1 |
Email server for sending emails after analysis finished |
--mail-server-port | -mport |
{Mail server port} 25 |
Email server port, see previous option |
--mail-server-security-type | -msectype |
NONE SSL STARTLS
|
Security type for mail server |
--mail-server-authenticated | -mauth | Is SMTP server authenticated | |
--mail-server-user-name | -muser | {Mail server user name} | SMTP user name |
--mail-server-password | -mpass | {Mail server password} | SMTP password |
--email-addresses | -em | {email1, email2, ...} | A comma-separated list of email addresses to which the analysis notifications should be sent |
Note: The bold-faced options are mandatory. Italic arguments are default values.
Mode of Operation
The headless application works in three different modes.
These modes are explained in the following table.
For running the tool in each of these modes, use appropriate value for --mode
(or -m
) argument.
Value for --mode argument |
Description |
---|---|
PARSE |
In this mode, the output file of a clone detection tool will be parsed to an Excel file. You mist give the path to the Excel file using -excelfile (or -x ) argument. You must also provide the name of the clone detection tool (using the --tool argument), the path to the input file (the output of clone detection tool, using -i argument), and for some specific clone detection tools, extra argument (using --xargs ). See below for more info. |
ANALYZE_EXISTING |
In this mode, the tool analyzes an existing Excel file. Again, the path to the Excel file must be given using -excelfile (or -x ) argument. The results of the analysis will be written in the same folder as the input Excel file. |
PARSE_AND_ANALYZE |
This mode first parses the output of the clone detection tool, and then analyzes the parsed Excel file. All the arguments in the PARSE mode must be also provided in this mode. |
The input (and output) Excel files
The input Excel file must be in Excel 97-2003 (.xls) format. Please note that, the tool cannot handle .xlsx files. The first row of the Excel file is used as header row. For the analysis, the input Excel file must contain the information for some of the columns, while for other columns, the cells will be filled during the analysis.
In the Excel file, each row is for one clone. Each clone is a code fragment which is detected to be duplicated in another part of the system. Several clones in the consecutive rows belong to one clone group. Hence, each possible pair of clones inside a clone group are code fragments that are duplicated. The row corresponding to the first clone of every clone group contains some information about the clone group, including values for Clone Group Size, Clone Group Info and Connected columns.
Column | Description |
---|---|
Clone Group ID | An integer assigned to every clone group. For all the clones inside one clone group, the value of this cell is similar, which is the ID of the clone group to which these clones belong. |
Source Folder | The source folder of the class file to which this clone belongs. |
Package | Fully qualified path to the package of the class file to which this clone belongs. |
Class | Name of the class file to which this clone belongs. |
Method | Name of the method in which this clone exists. Please note that, currently there is no support for the clones outside of the boundaries of methods. |
Method Signature | Signature of the method in which this clone exists, in the Bytecode format. |
Start Line, End Line, Start Offset, End Offset | Starting and ending lines and offsets of the clone fragment. |
#PDG Nodes | Number of PDG nodes in the method in which this clone exists. This column will be filled after analysis on this clone is done. |
#Statements | Number of statements in the clone fragment that is reported to be a clone. This column will be filled after analysis on this clone is done. |
Line coverage | Percentage of the number of lines of code fragment covered by unit tests. |
Clone Group Size | Number of the clones in the clone group. This value only comes in the first row of the clone group. |
Clone Group Info | Type of the clone group. It might be Repeated when the entire clone group is repeated, or Subclone when the clones in this clone group are sub-clones or super-clones of clones in another clone group. In these two cases, our tool will skip the clone group for analysis. |
Connected | If the value of the previous cell is Subclone, this cell contains the clone group ID of the clone group of which this clone group is a sub-clone (or super-clone). |
Clone Pair Location | Location of the clones in the clone group. Clones could be in the same in the same method, in the same class, or in different classes. |
#Refactorable Pairs | Number of refactorable pairs in the clone group, which is calculated after the analysis. |
Details | Each pair of clones in every clone group is analyzed by the tool. When the analysis finished, in this column, and the following columns in the same row, hyperlinks to the HTML reports of the analysis of the clone pair corresponding to this row and all other clones in the same clone group are given. The name of the hyperlink is in the format {clone group ID}-{first clone number}-{second clone number} .If the background color for a cell is green , it means that the clone pair corresponding to this cell is refactorable, if it is red , it means that the clone pair is not refactorable. A white background color shows that the clone is not analyzed. This happens when:
|
A sample empty Excel file is provided here.
Using the output of clone detection tools
The output of a clone detection tool must be first converted to the desired Excel file. For convenience, we have provided parsers for the popular clone detection tools, as an internal feature in the command-line tool.
When the tool is executed in the PARSE
or PARSE_AND_ANALYZE
modes, user has to provide the tool with the output file of the clone detection tool, using --tooloutputfile
(-i
) argument.
Also, the name of the clone detector must be specified using --tool
(-t
) argument.
For example, the following arguments can be used to parse and analyze an output from CCFinder for project Apache Ant:
-p apache-ant-1.7.0
-x "apache-ant-1.7.0-ccfinder.xls"
-m PARSE_AND_ANALYZE
-t CLONE_TOOL_CCFINDER
-i "ccfinder.ccfxd"
-xargs "C:\Results\CCFinder\apache-ant-1.7.0\src\.ccfxprepdir",""
-testsrcs "src/tests/junit"
For the moment the tool supports five different clone detection tools, as shown in the table below.
The value for --extra-args
(-xargs
) argument depends on the tool, and provides necessary information for parsing the input file.
For instance, in this example we have provided two additional strings through this argument, separated by comma.
Clone Detection Tool | --tool (-t ) |
--extra-args- (-xargs ) |
---|---|---|
CCFinder | CLONE_TOOL_CCFINDER |
|
Deckard | CLONE_TOOL_DECKARD | Not needed |
ConQAT | CLONE_TOOL_CONQAT | Not needed |
CloneDR | CLONE_TOOL_CLONEDR |
Path to the folder where the analyzed project was initially located (This is important because these tools save absolute paths to the analyzed Java files) |
Nicad | CLONE_TOOL_NICAD |
Output of the commandline tool
The commandline tool generates an Excel file, with the same name (appended by -analyze
) and in the same path as the input Excel file which contains the results of the analysis.
The HTML reports of the analysis can be found in a folder named html.reports
which is located in the same folder as the input and output Excel files.
When the tool is used to parse the output of a clone detection tool, a folder named code-fragments
in the same path as the input and output Excel files is created,
which contains the real code fragments as reported by the clone detection tool.
The names of these files are in the format {ID}-{CLONE_NUMBER}
, where {ID}' is the ID of the corresponding clone group to which this clone belongs, and
{CLONE_NUMBER}` is the clone's index in current clone group. This helps in mapping Excel file rows (clones) to these files.
For those who are interested in performing statistical analysis using tools such as R, Matlab, etc, the tool generates CSV files containing information gathered during analysis. Three CSV files are created, as explaned below. Please note that, the separator in these files is pipe ("|") character. The first row of these files is header.
{INPUT_EXCEL_FILE_NAME}.report.csv
Contains general information about the refactorability analysis results. Every row in these files corresponds to a single clone pair. The columns in the order they appear in the CSV files are:
Column Name | Description |
---|---|
GroupID | ID of the clone group of this clone pair |
PairID | ID of the clone pair, created by appending clone indices with a hyphen between them |
ClonePairLocation |
Identifies the relative location of clones. One of these values:
|
IsTestCode |
Identifies whether the clone is test code or not. It may have one of these values:
|
#StatementsInCloneFragment1 & #StatementsInCloneFragment2 | Number of statements (AST nodes) in clones that were analyzed. Note that, this might be different from what was reported by the clone detection tool, as tool applies filtering on the AST nodes, as discussed in the paper. |
#NodeComparisons | Number of node comparisons that were done to assess the refactorability of the clone |
#PDGNodesInMethod1 & #PDGNodesInMethod2 | Number of PDG nodes in the analyzed method bodies |
#RefactorableSubtrees | Number of subtrees in the analyzed methods that can be refactored |
SubtreeMatchingWallNanoTime | Time spent in finding the common nesting structures between the compared methods (in Nano seconds) |
Status |
Identifies the status of the analysis, one of the following values:
|
{INPUT_EXCEL_FILE_NAME}.trees.csv
For every clone pair, more than one subtree may be found which could be refactorable or not. This file contains the information about every subtree. The columns in the order they appear in the CSV files are:
Column Name | Description |
---|---|
GroupID & PairID | Used to identify to which clone pair this subtree belongs |
TreeID | Index of the subtree for this clone pair |
CloneType | Type of the clone which could be 1, 2, 3 or Unknown (4) |
PDGMappingWallNanoTime | Time spent to map PDG nodes, |
#PreconditionViolations | Number of Precondition Violations, |
#MappedStatements | Number of mapped statements. If this value is more than zero and also #PreconditionViolations is zero, the subtree is refactorable, |
#UnMappedStatements1 & #UnMappedStatements2 | Number of unmapped statements in the first and second subtree, |
#Differences | Number of differences in the mapped statements. |
RefactoringWasOK | Was refactoring successful? |
TestsFailedAfterRefactoring | Were any tests failed after refactoring? |
HadCompileErrorsAfterRefactoring | Did we have compile errors after refactoring? |
CloneRefactoringType |
Type of the refactoring. One of the following values:
|
IsTemplateMethodApplicable | Is template method refactoring applicable for this refactoring? |
{INPUT_EXCEL_FILE_NAME}.precondviolations.csv
This file contains information about precondition violations for each subtree, if the subtree was not found to be refactorable, using the traditional . The columns in the order they appear in the CSV files are:
Column Name | Description |
---|---|
GroupID, PairID & TreeID | Identifies to which subtree this precondition violation belong |
PreconditionViolationType |
Type of the precondition violation, one of the following values:
|
{INPUT_EXCEL_FILE_NAME}.compileerrors.csv
This file contains compile errors, after refactoring is done on each subtree. The file has the following columns:
Column Name | Description |
---|---|
GroupID, PairID & TreeID | Identifies to which subtree this compile error belongs |
FileHavingCompileError | Relative path to the file that has compile errors after refactoring |
{INPUT_EXCEL_FILE_NAME}.testdifferences.csv
This file contains the tests are failed, after refactoring is done on each subtree. The file has the following columns:
Column Name | Description |
---|---|
GroupID, PairID & TreeID | Identifies for which subtree this test difference exists |
TestDifference | Name of the test case that is failing after refactoring |
{INPUT_EXCEL_FILE_NAME}.exprgapsinfo.csv
This file contains information about the expression differences between the clone pairs for each subtree; i.e., the differences which lead to lambda expressions that has a single expression as its body. The file has the following columns:
Column Name | Description |
---|---|
GroupID, PairID & TreeID | Identifies to which subtree this expression gap belongs |
#Params | Number of parameters for the created lambda expression |
#ReturnType | Return type of the lambda expression |
#ThrownExceptions | Number of the thrown exceptions by the lambda expression |
#NonEffectiveFinalVars | Number of non-effectively final variables for which JDeodorant has to make final variables (so that they can be used inside the lambda expression) |
{INPUT_EXCEL_FILE_NAME}.blockgapsinfo.csv
This file contains, for each subtree, information about the block gaps,
i.e., the gaps for which JDeodorant has to make lambda expressions with
a block of statements as their body.
The file has the same columns as {INPUT_EXCEL_FILE_NAME}.exprgapsinfo.csv;
in addition, it contains two additional columns, namely #Statements1
and #Statements2
, which include the number of statements inside the body of the created lambda expressions for the first and second clone pairs, respectively.