Chi-Square Identified Significant Groups, Now Which One is More Significant?
turkalpmd opened this issue · 2 comments
Dear @raphaelvallat,
In my thesis work, I examined the relationships between two main categories of multiple categorical variables and reached a point where the analyses I conducted started to feel meaningless. For example, I investigated the categorical relationship between four different age groups and eleven different diagnostic groups using the pd.get_dummies method and found many significant results. However, using Bonferroni adjustment to evaluate these seemed too simplistic for the complexity of categorical tests I was dealing with. Therefore, I sought a posthoc analysis for categorical tests and stumbled upon this wonderful article which solved my problems.
Ransacking is a post-hoc analysis technique used in statistical analyses, especially after Chi-Square tests, to identify specific 2x2 tables of interest within a large r x c contingency table and evaluate the statistical significance of these smaller tables. Essentially, the ransacking method focuses on specific 2x2 tables created by selecting certain cells from a contingency table, and assessing how these smaller tables reveal specific relationships within the larger table. This method is often used to understand complex relationships in large tables, as the overall result of a Chi-Square test can stem from interactions among multiple variables, and determining which pairs of variables contribute most to the result can be challenging.
Ransacking begins with the creation of a relevant 2x2 table. Then, it compares the odds or probabilities within this table and calculates an odds ratio or log odds ratio. This evaluates the strength and direction of the relationship between specific cells. The calculated odds ratio is then used to determine whether the null hypothesis of independence (whether there is a relationship between the two variables) is rejected or not. This method is particularly valuable for understanding specific interactions between variables and how these interactions contribute to the overall results of the Chi-Square test.
In conclusion, I want to highlight that traditional 2x2 contingency tests for Chi-Square analyses do not fully meet my needs, and logistic regression is not entirely suitable for my problem. Hence, I emphasize the importance of post-hoc analysis methods like ransacking and the ability of these methods to rank relationships between groups. I particularly appreciate the ransacking method and have also implemented other adjustment methods found in the article in Python. I aim to add these post-hoc tests to the Pingouin library. This addition will expand the library's capability to perform more detailed analyses after Chi-Square tests, allowing users to discover more specific relationships.
Hi @turkalpmd,
Apologies about the slow reply and thank you for the very detailed explanation. I wasn't familiar with these methods (e.g. ransacking) but from what you described I think they make a lot of sense and could be useful addition to Pingouin. I see also that the paper you shared was referenced in more than 1000 publications.
To have more visibility into what this function would look like, could you share an example (minimal) implementation in Python, together with a real-world dataset where this method is the most appropriate?
Thanks,
Raphael
Hello @raphaelvallat ,
The delay is not an issue, as I have also created a repository trying to explain as much as I could. I implemented it using the base libraries suitable for Pingouin, as I have contributed to it before. I cannot use my patient data due to ethical reasons (HIPAA) as real-world evidence, but I tried to demonstrate the results using the ChiSqr data in Pingouin.
As a summary from within the notebook, I can share this part here, and the notebook can be accessed from this link; Implementation of the Ransacking Method in Python:
According to the standard Chi-Squared test, we currently only have chi-squared values, Cramer's V values, and power values. However, according to classical statistics, there is no significant relationship between cp_0 and restecg_0, but we can say there is a significant relationship with restecg_1. In doing this, the degrees of freedom (dof) will be 10 due to adjustment, and we can consider that the significant p-value for adjusted alpha will change. Ultimately, this only indicates the presence of a relationship but cannot specify the ranking or level of it. At this point, this method appears to be effective.
Even though the dataset in question did not yield particularly meaningful results, my thesis work provided the outcomes I was looking for. For example, among the subgroups of admitting categories, for the age group of 0-2, an increased risk was observed for the respiratory failure subgroup (OR: 3.64, Z: 10.32), while a decreased significance of association was noted for intoxication admissions (OR: 0.15, Z: -6.9). Conversely, for the age group over 12, a lower risk was found for respiratory failure (OR: 0.31, Z: -6.77), while the risk for intoxication admissions was significantly higher compared to other age groups (OR: 4.39, Z: 10.6). These findings intuitively matched my medical hypotheses.