Odds-Ratio - Negative values
nsankar opened this issue · 6 comments
Hi,
Hope you are doing great. When I used GPS classifier on a sensor anomaly data where X has the covariates (continuous numeric variables) and T is a continuous treatment variable (specific sensor data that probably caused the anomaly) and Y (outcome) is the anomaly labels(binary outcome , 1 anomaly and 0 normal) and When I used gps.estimate_log_odds prediction API
I get negative values (odds-ratio) for some of the Treatment variables. Below is one example. (image #1)
I believe negative values are incorrect? Am I missing something?
Also, How should I interpret odds-ratio values that has an arbitratry min/max range for a range of Treatment variables predicted using gps.estimate_log_odds ? (Pls. see image #2 below as an example)
Thank you in advance.
Hi @nsankar ! You’re absolutely correct, there should never be negative odds ratios. Hmmm. Would you be willing to send me a sample of your data so I can do some debugging?
@ronikobrosly how can I reach you on the email
@nsankar My email is roni.kobrosly@gmail.com
Hi @nsankar , I think I might understand the issue. Did you use gps.estimate_log_odds
to generate the image #1? If so, what you generated was an array of log-odds, which can possibly range from -∞ to ∞. So the negative results you observed would be possible fine.
If you're looking for to generate odds ratios using the lowest treatment value as a reference (the preferred way to use this GPS_Classifier), you should use the calculate_CDRC
method.
So your workflow would look something like this:
gps = GPS_Classifier()
gps.fit(T = df['t'], X = df['x'], y = df['y'])
gps_results = gps.calculate_CDRC(0.95)
Where gps_results
will contain a column of the odds ratios. As mentioned here, the odds ratios generated with this function give you a sense of the relative odds of a treatment value causing the highest outcome class to occur relative to the lowest treatment value. So if you want to see the causal effect of a treatment value of 20.0
and the lowest treatment value happens to be 10.0
, the odds ratio at 20.0 will represent:
odds of higher outcome class occuring at treatment = 20.0
/ odds of higher outcome class occuring at treatment = 10.0
If the odds ratio here is 1.0
, that tells you a treatment value of 20.0
does nothing different over the effect of a treatment value of 10.0
. If the odds ratio is 5
, then the treatment value of 20.0
had 5 times the effect of a treatment value of 10.0
. So it provides relative treatment effects, relative to the lowest treatment value. These odds ratios should always be bound between 0 to ∞. They will never be negative.
Now, the gps.estimate_log_odds
produces something different. It is not relative to any treatment value. It simply gives you the log odds of the higher outcome class occurring at a provided treatment value. Again, these values can possibly range from -∞ to ∞ and are more difficult to interpret.
Does this help? Or did I miss the point?
@ronikobrosly Noted. Yes.I had used the gps.estimate_log_odds method to predict and to plot the image. I get your point. I will go through gps.calculate_CDRC function and try . This really helps. Thanks for the insights.
Great! Feel free to close the issue if that’s it, or let me know if you have any other questions.