The R package
dress
provides some measures for disclosure risk associated with the release
of protected data, irrespective of what mechanism was used to protect
it. Key principles of the disclosure framework include distinctness,
accuracy and un-deniability. This method can be applied to any pair of
original and protected data-sets despite a difference in dimensionality
and without assuming any particular joint probability structure between
the original and protected data.
You can install the stable version from CRAN:
install.packages("dress")
You can install the development version from GitHub
# install.packages("remotes")
remotes::install_github("mohammedfaizan0014/dress")
Installing this software requires a compiler
library(svMisc)
library(dress)
library(sdcMicro)
# ##################
# ##all continuous###################
CASC_sample <- CASCrefmicrodata[,c(2,3,4,6)]
CASC_protected <- addNoise(CASC_sample,noise = 100)$xm #Additive Noise protected
DRisk_NN <- drscore(
Sample = CASC_sample, #Original Sample
Protected = CASC_protected,
delta = 0.05,
kdistinct = 0.05, #k distinct threshold if integer then
# probability threshold is k/SS (SS = sample size)
ldeniable = 5, # l undeniable threshold if integer then
# probability threshold is l/SS (SS = sample size)
neighbourhood = 1,
#Possible 'neighbourhood' types
# 1 = Mahalanobis (Based on Mahalanobis Distance)
# 2 = DSTAR (Based on Density Based Distance)
# 3 = StdEuclid (Based on Standardised (by std dev) Euclidean Distance)
# 4 = RelEuclid (Relative Euclidean Distance sum_k ((Xk-Yk)/Xk)^2)
neigh_type = 'prob',
#Possible 'neigh_type' types
#constant = fixed threshold on distance
#prob = Nearest Neighbour Probability Neighbourhood used (Worst Case Scenario 1)
#estprob = = Nearest Neighbour Probability Neighbourhood used based on protected density (Worst Case Scenario 2)
numeric.vars = 1:4, #Which Variables are continuous?
outlier.par = list(centre = median,
scale = var,
thresh = 0.01)
#Parameters to adjust how MV outliers are determined.
#Default is that lie 99% (based on Chi-Square n-1 dist) away from median after scale by variance.
)
#>
#> ######################################################################
#> # Disclosure Risk Assessment #
#> ######################################################################
#> Nearest Neighbour Neighbourhood with parameters:
#> delta = 0.05, kdistinct = 0.05, ldeniable = 0.00462962962962963.
#>
#> Number of Observations in the Sample 1080
#> Number of Observations in the Protected Sample 1080
#> Number of Continuous Variables 4
#> Number of Key Categories 1
#> Number of Outliers in Sample 38
#> Number of Distinct Points in Sample 1080
#> Number of Distinct Outliers in Sample 38
#> Number of Exact Matches in Sample 0
#> Number of Interval Matches in Sample 0
#> Number of Outlier Interval Matches in Sample 0
#> Number of Distint Outlier Interval Matches in Sample 0
#>
#> Delta Disclosure Risk of Sample 0.1037
#> Delta Disclosure Risk of Sample Outliers 0.7632
#> Proportion Distinct 1
#> Proportion Estimated 0.9583
#> Proportion Undeniable 0.1037
#>
#> Category Level Disclosure Risk:
#>
#> N.Obs DRisk Out_DRisk Distinct Estimated Undeniable
#> All 1080 0.1037037 0.02685185 1 0.9583333 0.1037037
#Update neighbourhood to fixed threshold definition
DRisk_Fxd <- update(DRisk_NN,neigh_type = 'constant',
delta = 1)
#>
#> ######################################################################
#> # Disclosure Risk Assessment #
#> ######################################################################
#> Threshold Neighbourhood with parameters:
#> delta = 1, kdistinct = 0.05, ldeniable = 0.00462962962962963.
#>
#> Number of Observations in the Sample 1080
#> Number of Observations in the Protected Sample 1080
#> Number of Continuous Variables 4
#> Number of Key Categories 1
#> Number of Outliers in Sample 38
#> Number of Distinct Points in Sample 642
#> Number of Distinct Outliers in Sample 38
#> Number of Exact Matches in Sample 0
#> Number of Interval Matches in Sample 7
#> Number of Outlier Interval Matches in Sample 0
#> Number of Distint Outlier Interval Matches in Sample 0
#>
#> Delta Disclosure Risk of Sample 0.1444
#> Delta Disclosure Risk of Sample Outliers 0.2895
#> Proportion Distinct 0.5944
#> Proportion Estimated 0.9546
#> Proportion Undeniable 0.1546
#>
#> Category Level Disclosure Risk:
#>
#> N.Obs DRisk Out_DRisk Distinct Estimated Undeniable
#> All 1080 0.1444444 0.01018519 0.5944444 0.9546296 0.1546296
- Common questions about dress package are often found on stack-overflow.