A C++ software tool for imputating multivariate missing data with multiple methods (mean, multiple imputation, least-wise deletion, multivariate normal and skew-normal clustering). It is distributed for research use only under the GNU General Public License v3.0.
If you use this software for your research, please acknowledge it in your papers by citing the following references:
S. Riggi et al., "Handling missing data for the identification of charged particles in a multilayer detector: A comparison between different imputation methods", Nucl. Instr. and Meth. A 780 (2015) 81–90
or consider including me (S. Riggi, INAF - Osservatorio Astrofisico di Catania, Via S. Sofia 78, I-95123, Catania, Italy
as a co-author on your publications.
Software is currently been updated.
Install the project mandatory dependencies:
- ROOT [https://root.cern.ch/]
- R [https://www.r-project.org/], install also these additional packages: RInside, Rcpp, Matrix, Amelia, flexclust
- log4cxx [https://logging.apache.org/log4cxx/]
- boost [http://www.boost.org/]
Make sure you have set the following environment variables to the external library installation dirs
- ROOTSYS: set to ROOT installation path
- LOG4CXX_DIR: set to LOG4CXX library installation path
- BOOST_ROOT: set to BOOST library installation path
NB: Modify Makefile CPPFLAGS and LDFLAGS in case the dependency tools cannot be found.
To build the project:
- Clone this repository into your local $SOURCE_DIR
git clone https://github.com/simoneriggi/mida-imputation.git $SOURCE_DIR
- In the project directory type:
Binaries will be placed in the bin/ directory and libraries in the lib/ directory.
MDImputation [--input=[path-to-inputfile]] [--config=[path-to-configfile]]
--input=[path-to-inputfile] - Input data file (.dat) with missing data to be imputed
--method=[imputation-method] - Imputation method to be used (1=MEAN, 2=LISTWISE DELETION, 3=MultipleImputation, 4=MN clustering, 5=MSN clustering
--config=[path-to-configfile] - Configuration file name with options