Different cur.type for metric and ordinal variables
manuelarnold opened this issue · 8 comments
Currently, the cur.type is 1 for categorical variables and 2 for metric and ordinal variables. Since the distinction between ordinal and metric variables is important for both maxLR test statistics and score-based tests, it would make sense to use different cur.type values for both types of variables.
1: categorical
2: ordinal
3: metric
This is work-in-progress now.
Please see versions from 6e01466 and above. We now have pseudo-constants that can be returned to define scale of measurement. Please return the respective types from the score tests back to growTree()
. The constants are defined in semtree-package.R
as:
.SCALE_METRIC = 2
.SCALE_ORDINAL = 3
.SCALE_CATEGORICAL = 1
semtree now properly handles unordered and ordered factors but these changes broke score-tests for ordinal variables. I identified one possible problem in your code (2d813e8) but the score test still fails. Let me know what you need to know to fix this, @manuelarnold .
I tried to fix the issue in d7b1247. I hope this is all that is needed. Please confirm.
@manuelarnold, could you please confirm that this is OK and then close the issue?
There are some new changes related to this topic that we could discuss here:
In my fork, I also distinguish between dummy (categorical variables with two levels) and multinomial variables (categorical variables with more than 2 levels). So, I would be in favor of separating nSCALE_CATEGORICAL into .SCALE_MULTINOMIAL and .SCALE_DUMMY.
By the way, score-based testing of multinomial variables is now fully score-based and should be faster than the testing in the main branch.
@manuelarnold, how should we proceed with these changes? Would you want to prepare a pull request, so that I can check your proposed changes?
I think these changes are already in the main branch. I will try to solve some conflicts in the next weeks and then we can start the process of synching the branches.