Another Look on the Impact of Passive Voice Requirements on Domain Modelling

This repository contains the reanalysis of the empirical study that investigated whether the use of passive voice in natural language requirements specifications has an impact on the domain modeling activity [1]. The authors of the original study disclosed their data at https://doi.org/10.5281/zenodo.7499290 and we reanalyze it with both a frequentist and a Bayesian approach. See the section on author and article details for a reference to the publication discussing the method and results.

Author and Article Details

This repository is procured by the following researchers.

Name	Affiliation	Contact
Julian Frattini*	Blekinge Institute of Technology	julian.frattini@bth.se
Davide Fucci	Blekinge Institute of Technology
Richard Torkar	Chalmers, University of Gothenburg, and Stellenbosch Institute for Advanced Study (STIAS)
Daniel Mendez	Blekinge Institute of Technology and fortiss GmbH

The * marks the corresponding author.

Cite this work as follows: Frattini, J., Fucci, D., Torkar, R., & Mendez, D. (2024). A Second Look at the Impact of Passive Voice Requirements on Domain Modeling: Bayesian Reanalysis of an Experiment. In 1st International Workshop on Methodological Issues with Empirical Studies in Software Engineering (WSESE2024).

Description

This repository contains the following artifacts:

data/ : folder containing the experimental data obtained by the original study [1]
- raw/ : folder containing the raw data
  - experience.csv : detailed data about the experience of each participant
  - participants.csv : general data about each experience (including an aggregated form of their experience)
  - requirements.csv : meta-data about the requirements that were used in the study
  - responses.csv : evaluation of the domain models (i.e., number of missing actors, entities, and associations per domain model)
- results/ : folder containing the resulting marginal distributions as produced by the Bayesian data analyses
- ipv-data.csv : table compiled from the raw data by the data preparation script
figures/ : folder containing all figures used in the manuscript reporting this reanalysis
- dags/ : folder of all directed, acyclic graphs (generated with ggdag)
- marginal/ : folder of all marginal plots (generated with ggplot)
src/ : folder containing all scripts
- bayesian/ : folder containing the Bayesian re-analysis of the original hypotheses
  - causal-assumptions.Rmd : notebook containing the explicit causal assumptions of the studied phenomenon
  - missing-actors.Rmd : regression model estimating the impact on missing actors
  - missing-associations.Rmd : regression model estimating the impact on missing associations
  - missing-objects.Rmd : regression model estimating the impact on missing domain objects
- frequentist/frequentist.Rmd : notebook containing a reproduction of the original data evaluation
- html/ : folder containing a precompiled html version of each Rmd notebook (created with knitr) [1]
- util/ : folder containing all supporting scripts and notebooks
  - data-loading.R : script for loading the prepared data
  - data-preparation.Rmd : notebook that prepares and assembles the raw data such that it is fit for reanalysis
  - marginal-plot-visualization.Rmd : notebook generating one marginal plot from the three individual marginal plots of each Bayesian analysis
  - model-eval.R : script to evaluate the isolated difference of the response variable distribution based on different values of the treatment (passive voice)

System Requirements

In order to fully utilize this replication package, ensure that you have R (version > 4.0) and RStudio installed on your machine. Then, ensure the following steps:

Install the rstan toolchain by following the instructions for Windows, Mac OS, or Linux respectively.
Restart RStudio and follow the instructions starting with the Installation of RStan
Install the latest version of stan by running the following commands

    install.package("devtools")
    devtools::install_github("stan-dev/cmdstanr")
    cmdstanr::install_cmdstan()

Install all missing packages via install.packages(c("tidyverse","ggdag","dagitty","patchwork","brms","marginaleffects","rcompanion","psych"))
Create a folder called fits within src/bayesian/ such that brms has a location to place all Bayesian models.
Open the rqi-ipv.Rproj file with RStudio, which will setup the environment correctly.

Application

If you want to replicate and assess the evaluation presented in the accompanying manuscript, we recommend looking at the following files in this order. For each script, you can choose the interactive Rmd file that allows to inspect and manipulate each variable, or the html file, which is a pre-compiled version of each Rmd notebook.

Data preparation (interactive or static) to understand the data under analysis.
Frequentist analysis (interactive or static) to understand the frequentist data analysis of the original experiment [1].
Causal assumptions (interactive or static) to inspect the causal assumptions about the studied phenomenon. This covers the modeling and identification step of the applied framework for statistical causal inference [2].
Bayesian data analysis (files prefixed with missing- in the bayesian and html) to follow the regression modeling of all three response variables of interest. This covers the estimation step of the applied framework for statistical causal inference [2].

References

[1] Femmer, H., Kučera, J., & Vetrò, A. (2014, September). On the impact of passive voice requirements on domain modelling. In Proceedings of the 8th ACM/IEEE international symposium on empirical software engineering and measurement (pp. 1-4).

[2] Siebert, J. (2023). Applications of statistical causal inference in software engineering. Information and Software Technology, 107198.

JulianFrattini/rqi-ipv