Replication kit to SMS

This replication kit is related to the paper 'Architectural Technical Debt - A Systematic Mapping Study' accepted in SBES 2023.

Armando Sousa, Lincoln Rocha, and Ricardo Britto. 2023. Architectural Technical Debt - A Systematic Mapping Study. In XXXVII Brazilian Symposium on Software Engineering (SBES 2023), September 25–29, 2023, Campo Grande, Brazil. ACM, New York, NY, USA, 10 pages. https://doi.org/10.1145/3613372.3613399

The replication kit is essential to reproduce all steps in a Systematic Mapping Study. Here, we describe the data and scripts used in a systematic mapping study (SMS) related to Architectural Technical Debt (ATD). The study involves creating a replication kit, a dataset, using scripts to analyze the data, and generating figures and tables. The SMS process involves five main phases: defining research questions, conducting a search process, selecting relevant studies, extracting data, and analyzing the data. The study aims to classify ATD types, measure ATD, monitor ATD, identify tools and methods to identify ATD, and calculate the cost of ATD items. The analysis involves generating various results related to publications, research types, ATD types, measurement, monitoring, tools, methods, and the cost of ATD items. A WordCloud of methods is also generated to visualize the results related to methods to identify ATD. Finally, each research question has its own specific section to show the main results.

More details

You can find the following structure:

.
├── LICENSE (Simplified BSD License)
├── README.md (Description of the replication kit)
├── csv (this folder contains .csv files)
├── dataset (this folder contains .csv, .xls, and .txt files related to the main dataset of the SMS)
├── images (this folder contains the figures generated by scripts and figures used in the main SMS paper)
├── latex (this folder contains the .tex files related to Latex tables)
├── md (this folder contains markdown files that support this replication kit)
└── python (this folder contains python scripts and .ipynb notebook files)
    ├── analyses (this folder contains .ipynb notebook files related to NLTK - Natural Language Toolkit - analysis)
    ├── auxiliary (this folder contains .ipynb notebook files that support latex files, text files, and generate md tables)
    ├── original (this folder contains the initial prototype of this replication kit - deprecated)
    ├── requirements.txt (this file contains all requirements necessary to run the scripts)
    └── util (this folder contains python scripts to support other functions of the replication kit)

You can find the following types of files in this replication kit:

.csv files containing data to be analyzed in other tools.
.xls files for general analysis in Excel.
.txt files with general information in text format.
.tex files for LaTeX tables.
.md files containing markdown files published in this replication kit.
.ipynb files containing Python notebook files to be executed in Jupyter Notebook or Google Colab.

Requirements

pandas: pandas is a powerful data manipulation and analysis library that provides data structures like DataFrames and Series, making it easy to work with structured data.
numpy: is a fundamental package for scientific computing with Python, providing support for large, multi-dimensional arrays and matrices, along with an extensive collection of high-level mathematical functions.
xlrd: is a library to read data and formatting information from Excel files (.xls). It allows you to extract data from Excel spreadsheets into Python data structures.
nltk: Natural Language Toolkit - nltk is a leading platform for building Python programs to work with human language data. It provides tools for text processing, tokenization, stemming, tagging, and more.
pillow: is a friendly fork of the Python Imaging Library (PIL), which provides support for opening, manipulating, and saving many different image file formats.
matplotlib: is a comprehensive library for creating static, animated, and interactive visualizations in Python. It provides a wide range of 2D plotting functionalities.
seaborn: is a data visualization library based on matplotlib that provides a high-level interface for creating attractive and informative statistical graphics.
sklearn: scikit-learn is a versatile machine learning library that provides simple and efficient tools for data mining and data analysis. It includes various algorithms for classification, regression, clustering, and more.
tabulate is a library for tabular data formatting, providing a convenient way to display data in a structured and readable format, especially when working with lists, dictionaries, or DataFrames.

Install dependecies

pip install -r python/requirements.txt

Install word cloud

A little word cloud generator in Python.

git clone https://github.com/amueller/word_cloud.git
cd word_cloud && pip install .

Return to the main replication kit path (smsatd)

cd ..

Install Jupyter Notebook

pip install jupyter

Executing scripts

If you use jupyter notebook:

jupyter notebook

Open scripts in python folder that you want to run

If you use Google Colab, you can simply open the Python script directly from Google Colab and install dependencies before running the selected Python script.

Dataset

The dataset organize all studies used in this case. We created a spreadsheet to group the selected papers with all critical characteristics evaluated in this SMS.

The spreadsheet is available on Extraction_form

Systematic Mapping Process

The main activities of this SMS follow five main phases:

Define research questions
Search process
Selection process
Extraction process
Analysis process

Figure 1 - show the flow used in this SMS

Figure 2 - summary of the literature study

We used the following Form | Script to extract and analyze the data from the spreadsheet.

Scripts used to analyse the dataset

Publications and Venue Types

Selected Papers | Venues

The following scripts Selected papers and sms_extraction is used to generate results about publications and venue types.

Research type of publication - according to Wieringa et al. (2006)

Research types

The following script Research type is used to generate results about research type classification.

RQ1 - Type of Architectural Technical Debt

ATD Types | ATD Types before findings

The following script ATD types is used to generate results about ATD's type classification.

RQ2 - Measurement of ATD

Measurement

The following script Measurement is used to generate results about ATD's measurement classification.

RQ3 - Monitoring of ATD

Monitoring

The following script Monitoring is used to generate results about ATD's monitoring classification.

RQ4 - Tools and Methods to Identify ATD

Tools | Tools with more features | Methods | Methods and SPs

The following scripts Tools and Methods are used to generate ATD's tools and ATD's method classification results.

WordCloud of methods methods

RQ5 - Calculate the Cost of ATD item

Calculate cost of ATD

The following script Cost of ATD is used to generate results about how to calculate ATD item cost

armandossrecife/smsatd