hknd23/idcempy

Paper: efficiency claim

Closed this issue · 3 comments

The end of the statement of need section says "efficient time-wise" -- this reads like a claim about the performance of your package/code - either remove/rephrase this or provide performance evaluation information.

We incorporated your suggestion by dropping the phrase “efficient time-wise” in our revised paper. We briefly mention, however, that the time taken by the code in our IDCEMPy package to estimate in our data (the run-time for) the inflated discrete choice models without correlated errors (ZiOP, MiOP, GiMNL) ranges from approximately 8 to 75 seconds, while the run-time for the correlated error models (ZiOPC, MiOPC) vary from about 5 to almost 90 minutes. The run-time from the R code for the inflated (i) non-correlated error models in the data range from 9 to 15 minutes, and (ii) correlated error models is roughly 15 to 163 minutes.

I'm not sure what "in the provided data applications" means. Do you have code in the repository that runs these timing tests?

I'm also unclear exactly what you're telling us with this information, because there isn't any context. I'm not sure if this is a long time or short time, because I don't have a comparison point. I also don't know what would affect the differences in the timing (5 minutes is a lot different than 90!). A more complex model? Larger data set? Number of cores used (is this parallelized)? What type of computer did you use to run this code? Could that affect runtime?

My suggestion would be one of the following, but there are other ways you could resolve this.

  1. Move the timing information to the examples section of the documentation (out of the paper) to give people a reference for how long it took to run the given example, so they know that if their code has been running for 5 minutes, whether that indicates an error or they should just let it run longer. You'll still need some additional reference information so they can gauge how the timing on your system might compare to theirs. Then this becomes a helpful guideline to know people what to expect when running the code instead of a performance claim.
  2. If you want to include timing information in the paper, run more systematic timing tests over a variety of models to show how the time to run varies with # observations, # variables, model complexity, type of model, etc.

We addressed these crucial points as follows. First, as per your suggestion, we have added code from the time module for the examples in our documentation that provides the run-time in seconds for every inflated discrete choice example model that is estimated using the relevant function in our IDCeMPy package. Specifically, the code mentioned above, which is available in the /examples directory, tracks and reports the run-time—which users can observe—for each estimated Zero-Inflated as well as Middle-Inflated ordered probit model with or without correlated errors [ZiOP(C) and MiOP(C)], and the General-Inflated MNL [i.e., GiMNL] model. Second, as described below in our next response, we report in our documentation the exact time taken to fit each ZiOP, ZiOPC, MiOP, MiOPC, and GiMNL specification (we estimated several of these specifications) using the relevant code in our package. This provides a reference point for users to gauge the time taken to estimate each model employing the example datasets that we provide. Third, we use INTEL i-7 Quad-Core (16 GB RAM) to estimate the models in the package, which is explicitly stated in the examples included in our documentation. We also note that the models are fit using one core at a time and are therefore not parallelized. Information about the standard computer specs that we employ for our analysis may serve as a helpful guideline for users when running the code included in IDCeMPy. We next turn to address the remaining queries you raised about the runtime for the models in our package.

These are great suggestions that have been incorporated in the following manner. We first followed your advice by removing information—that is, deleting sentences—from our paper in which we had broadly suggested performance claims by mentioning the estimation time for our inflated discrete choice models. Instead, in the examples section of our revised documentation (but not the paper), we report the results from several distinct ZiOP, ZiOPC, MiOP, MiOPC, and GiMNL specifications as well as the run-time for each of these models. For instance, we report the run-time for the Zero-inflated Ordered Probit model with and without correlated errors (ZiOP/ZiOPC models) that have been estimated on two separate datasets: the repression dataset which contains 1984 observations, and the larger youth national tobacco consumption dataset that has 9624 observations. In the case of the Middle-inflated Ordered Probit model with and without correlated errors, we present the results and run-time for several different MiOP and MiOPC specifications: baseline specifications that include just four covariates, "intermediate" specifications that include ten covariates, and a full-fledged or complete specification that incorporates twenty-seven covariates. We also present the estimates and run-time for a full-fledged GiMNL model that includes twenty-three covariates. As we mentioned earlier, the code and data are available on the repository (/examples and /data).