Categorical covariates with more than two levels
Luming-L opened this issue · 2 comments
Hi,
My input file contains several categorical covariates with more than two levels. For example, the covariate smoker
has three levels: "non-smoker", "past smoker" and "current smoker". When running DeepNull, I got this error:
Cast string to float is not supported
After a long search, I realised that only numbers are accepted in regression. It seems that converting strings to numbers has not been embedded in DeepNull yet. Therefore, my questions are:
- If I want to include categorical covariates with more than two levels, should I recode these categorical covariates before running DeepNull?
- If yes, which encoding type is more proper for unordered categorical variable, one-hot encoding, dummy encoding or anything else?
Thanks!
Hi Luming,
Thank for you interested in DeepNull.
It is true that DeepNull can not deal with string categorical covariates
similar to most GWAS pipeline methods (EMMA, BOLT-LMM, GEMMA, REGENIE, etc).
I would use the same encoding that you will use for your GWAS analysis. I personally would prefer to use dummy encoding.
Thanks,
Farhad
Hi Farhad,
Thanks for the clarification! Dummy encoding also makes sense to my analysis.
Cheers,
Luming