This replication kit is related to the paper 'Architectural Technical Debt - A Systematic Mapping Study' accepted in SBES 2023.
Armando Sousa, Lincoln Rocha, and Ricardo Britto. 2023. Architectural Technical Debt - A Systematic Mapping Study. In XXXVII Brazilian Symposium on Software Engineering (SBES 2023), September 25–29, 2023, Campo Grande, Brazil. ACM, New York, NY, USA, 10 pages. https://doi.org/10.1145/3613372.3613399
The replication kit is essential to reproduce all steps in a Systematic Mapping Study. Here, we describe the data and scripts used in a systematic mapping study (SMS) related to Architectural Technical Debt (ATD). The study involves creating a replication kit, a dataset, using scripts to analyze the data, and generating figures and tables. The SMS process involves five main phases: defining research questions, conducting a search process, selecting relevant studies, extracting data, and analyzing the data. The study aims to classify ATD types, measure ATD, monitor ATD, identify tools and methods to identify ATD, and calculate the cost of ATD items. The analysis involves generating various results related to publications, research types, ATD types, measurement, monitoring, tools, methods, and the cost of ATD items. A WordCloud of methods is also generated to visualize the results related to methods to identify ATD. Finally, each research question has its own specific section to show the main results.
You can find the following structure:
.
├── LICENSE (Simplified BSD License)
├── README.md (Description of the replication kit)
├── csv (this folder contains .csv files)
├── dataset (this folder contains .csv, .xls, and .txt files related to the main dataset of the SMS)
├── images (this folder contains the figures generated by scripts and figures used in the main SMS paper)
├── latex (this folder contains the .tex files related to Latex tables)
├── md (this folder contains markdown files that support this replication kit)
└── python (this folder contains python scripts and .ipynb notebook files)
├── analyses (this folder contains .ipynb notebook files related to NLTK - Natural Language Toolkit - analysis)
├── auxiliary (this folder contains .ipynb notebook files that support latex files, text files, and generate md tables)
├── original (this folder contains the initial prototype of this replication kit - deprecated)
├── requirements.txt (this file contains all requirements necessary to run the scripts)
└── util (this folder contains python scripts to support other functions of the replication kit)
You can find the following types of files in this replication kit:
- .csv files containing data to be analyzed in other tools.
- .xls files for general analysis in Excel.
- .txt files with general information in text format.
- .tex files for LaTeX tables.
- .md files containing markdown files published in this replication kit.
- .ipynb files containing Python notebook files to be executed in Jupyter Notebook or Google Colab.
- pandas: pandas is a powerful data manipulation and analysis library that provides data structures like DataFrames and Series, making it easy to work with structured data.
- numpy: is a fundamental package for scientific computing with Python, providing support for large, multi-dimensional arrays and matrices, along with an extensive collection of high-level mathematical functions.
- xlrd: is a library to read data and formatting information from Excel files (.xls). It allows you to extract data from Excel spreadsheets into Python data structures.
- nltk: Natural Language Toolkit - nltk is a leading platform for building Python programs to work with human language data. It provides tools for text processing, tokenization, stemming, tagging, and more.
- pillow: is a friendly fork of the Python Imaging Library (PIL), which provides support for opening, manipulating, and saving many different image file formats.
- matplotlib: is a comprehensive library for creating static, animated, and interactive visualizations in Python. It provides a wide range of 2D plotting functionalities.
- seaborn: is a data visualization library based on matplotlib that provides a high-level interface for creating attractive and informative statistical graphics.
- sklearn: scikit-learn is a versatile machine learning library that provides simple and efficient tools for data mining and data analysis. It includes various algorithms for classification, regression, clustering, and more.
- tabulate is a library for tabular data formatting, providing a convenient way to display data in a structured and readable format, especially when working with lists, dictionaries, or DataFrames.
pip install -r python/requirements.txt
Install word cloud
A little word cloud generator in Python.
git clone https://github.com/amueller/word_cloud.git
cd word_cloud && pip install .
Return to the main replication kit path (smsatd)
cd ..
Install Jupyter Notebook
pip install jupyter
If you use jupyter notebook:
jupyter notebook
Open scripts in python folder that you want to run
If you use Google Colab, you can simply open the Python script directly from Google Colab and install dependencies before running the selected Python script.
The dataset organize all studies used in this case. We created a spreadsheet to group the selected papers with all critical characteristics evaluated in this SMS.
The spreadsheet is available on Extraction_form
The main activities of this SMS follow five main phases:
- Define research questions
- Search process
- Selection process
- Extraction process
- Analysis process
Figure 1 - show the flow used in this SMS
Figure 2 - summary of the literature study
We used the following Form | Script to extract and analyze the data from the spreadsheet.
The following scripts Selected papers and sms_extraction is used to generate results about publications and venue types.
The following script Research type is used to generate results about research type classification.
ATD Types | ATD Types before findings
The following script ATD types is used to generate results about ATD's type classification.
The following script Measurement is used to generate results about ATD's measurement classification.
The following script Monitoring is used to generate results about ATD's monitoring classification.
Tools | Tools with more features | Methods | Methods and SPs
The following scripts Tools and Methods are used to generate ATD's tools and ATD's method classification results.
WordCloud of methods methods
The following script Cost of ATD is used to generate results about how to calculate ATD item cost