DVM

Data Value Metric (DVM)

Table of contents
Overview
Python Code
Experimental results
Team
Acknowledgments
References

Overview

This SOCR project is focused on developing a new measure, called Data Value Metric (DVM), that quantifies the energy, or information content, of large and complex datasets, which can be used as a yardstick to determine if appending, expanding, or otherwise augmenting the data size or complexity may be beneficial in specific application domains.

Python Code

DVM is an open science project and provides the complete python code, case studies, examples, simulations, and results under LGPL license. Details are in the code folder.

Experimental results

Interactive 2D and 3D DVM surface plots illustrating the results of several experiment are available online. These graphs show the behavior of the DVM metric spanning the domain of possible number of cases and number of features for the real and simulated datasets.

Team

Ivo D. Dinov, Morteza Noshad, Jerome Choi and Yuming Sun.

Acknowledgments

This work is supported in part by NIH grants P20 NR015331, P50 NS091856, P30 DK089503, P30AG053760, UL1TR002240, and NSF grants 1734853, 1636840, 1416953, 0716055 and 1023115. Students, trainees, scholars, and researchers from SOCR, BDDS, MNORC, MIDAS, MADC, MICHR, and the broader R-statistical computing and Python communities have contributed ideas, code, and support.

References

Noshad, M, Choi, J, Sun, Y, Hero, A, Dinov, ID. (2021) A data value metric for quantifying information content and utility, Journal of Big Data, DOI: 10.1186/s40537-021-00446-6, in print.

SOCR/DVM