Does CNA support multiple unordered variables?
Closed this issue · 4 comments
altairwei commented
Hi,
I would like to know if CNA supports multiple unordered variables. As we known, the Pearson and Spearman’s correlation can handle binary or continuous variables or ordered categorical values, but they cannot handle multiple unordered categories. In the article (Reshef, Y. A. et al., Nature Biotechnology, 2021), all tree examples are based on binary variables.
Best,
yakirr commented
Hi there -- as currently implemented, CNA would require binary variables.
In principle it could be extended to unordered categories, but we have not
done this. One workaround that at least partially addresses your need
though would be encode your unordered categorical variable as a series of
binary variables. For example, if you have a variable "color" that can be
"red", "green", or "blue". You could create three binary variables
"is_red", "is_green", and "is_blue". This is not a perfect solution because
you can't jointly test all of the variables for association, but perhaps it
will give you an initial way of assessing whether there's signal in the
data.
…On Sun, May 15, 2022 at 10:41 PM Altair Wei ***@***.***> wrote:
Hi,
I would like to know if CNA supports multiple unordered variables. As we
known, the Pearson and Spearman’s correlation can handle binary or
continuous variables or ordered categorical values, but they cannot handle
multiple unordered categories. In the article (Reshef, Y. A. et al., *Nature
Biotechnology*, 2021), all tree examples are based on binary variables.
Best,
—
Reply to this email directly, view it on GitHub
<#12>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA4RHOPOMVUKJIW6SLLWEGTVKGYV3ANCNFSM5WABNCWA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
altairwei commented
@yakirr Thank you for your advice! This is indeed a workaround. Any suggestions on continuous variables or ordered categorical values?
yakirr commented
Continuous variables should work fine -- just plug them in. Ordered
categorical variables are a bit trickier -- you can always just code the
categories as {1,2,3...} and plug them in since they're ordered, but be
aware that they will be fit via a linear model so the assumption is that
whatever enrichment differentiates category 1 from category 2 will also
equally differentiate category 2 from category 3.
…On Mon, May 16, 2022 at 3:27 AM Altair Wei ***@***.***> wrote:
@yakirr <https://github.com/yakirr> Thank you for your advice! This is
indeed a workaround. Any suggestions on continuous variables or ordered
categorical values?
—
Reply to this email directly, view it on GitHub
<#12 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA4RHOIG542J7MRBPA7452LVKH2E5ANCNFSM5WABNCWA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>