AndriSignorell/DescTools

Better Documentation for DescTools::GoodmanKruskalTau() Needed

chelseadickens opened this issue · 3 comments

Clarification is needed on which formulas (from which references specifically) are being used in computing GoodmanKruskalTau(), particularly for the error metrics.

https://github.com/cran/DescTools/blob/7f2ee6031c6d3f2ec1e191a96a0362e3b3c6db5e/R/StatsAndCIs.r#L6860

When I run the below example included in the documentation for GoodmanKruskalTau(), the values output by the function do not match to any of the values in the SAS documentation cited. If the outputs from GoodmanKruskalTau() are supposed to match outputs from the SAS documentation, which outputs are they supposed to match?

# example in: 
# http://support.sas.com/documentation/cdl/en/statugfreq/63124/PDF/default/statugfreq.pdf 
# pp. S. 1821 
tab <- as.table(rbind(c(26,26,23,18,9),c(6,7,9,14,23))) 
 
# Goodman Kruskal's tau C|R 
GoodmanKruskalTau(tab, direction="column", conf.level=0.95)
# tauA      lwr.ci      upr.ci 
# 0.041216580 0.009920576 0.072512583 
# Goodman Kruskal's tau R|C
GoodmanKruskalTau(tab, direction="row", conf.level=0.95)
# tauA     lwr.ci     upr.ci 
# 0.16523315 0.04921484 0.28125146 

I am able to replicate the output for tauA from GoodmanKruskalTau() with direction = "row" using the below code derived from Formulas in Somers (1962) pg. 805 (though this reference is not listed in the documentation for the GoodmanKruskalTau() function), but I have not been able to replicate output for lwr.ci and upr.ci from GoodmanKruskalTau() as a result of not being able to decipher which formulas from which paper(s) are being used to produce the error metrics used in calculating the CIs.

Somers, R. H. (1962). A Similarity between Goodman and Kruskal's Tau and Kendall's Tau, with a Partial Interpretation of the Latter. Journal of the American Statistical Association, 57(300), 804-812.

# https://www.jstor.org/stable/pdf/2281811.pdf?refreqid=excelsior%3Aa802e4a8d721b4ab0f1926143bc7e1e1&ab_segments=&origin=&initiator=&acceptTC=1 

# for direction = "row"  
col_sum <- colSums(tab)
p_As <- c()
for(j in 1:ncol(tab)){
  for(i in 1:nrow(tab)){
    p_A <- sum((sum((tab[i,j]/col_sum[j])*((col_sum[j] - tab[i,j])/col_sum[j])))*(col_sum[j]/n))
    print(p_A)
    p_As <- c(p_As, p_A) # I don't think this should be specified here... it results in 
  }
}

row_sum <- rowSums(tab)
p_Bs <- c()
for(i in 1:nrow(tab)){ 
  p_B <- sum((row_sum[i]/n)*(1-(row_sum[i]/n)))
  print(p_B)
  p_Bs <- c(p_Bs, p_B)
}


for(i in 1:nrow(tab)){
  for(j in 1:ncol(tab)){
    q_A <- sum((tab[i,j]^2)/col_sum[j]) - sum(row_sum[i]^2)
    q_B <- n^2
  }
}

# this is equal to output from GoodmanKruskalTau  
1-(sum(p_As) / sum(p_Bs)) 
# 0.1652332 

GoodmanKruskalTau(tab, direction="row", conf.level=0.95) 
#      tauA     lwr.ci     upr.ci  
# 0.16523315 0.04921484 0.28125146

Thnx!
The function code is based on https://stat.ethz.ch/pipermail/r-help/2007-August/138098.html by Antti Arppe.
You're right that the literature mentioned does not match well. Antti wrote to follow Liebetrau 1983.

Can you please provide the results you get in SAS and the exact SAS-Code, please.

Note that the code indeed correctly (from my perspective) reproduces the results in Liebetrau pp. 24:

tt <- matrix(c(549,93,233,119,225,455,402,  
               212,124,78,42,41,12,132,
               54,54,33,13,46,7,153), ncol=3,
             dimnames=list(rownames=c("Gov", "Mil", "Edu", "Eco", "Intel", "Rel", "For"), 
                           colnames=c("One", "Two", "Multi")))

GoodmanKruskalTau(tt, direction = "row", conf.level = 0.95)
GoodmanKruskalTau(tt, direction = "column", conf.level = 0.95)

to Tau A|B = 0.0258, Tau B|A = 0.0861, ASE A|B = 0.00248, ASE B|A = 0.00693

What do you get in SAS for that?