SRBCT microarray data
szcf-weiya opened this issue · 4 comments
SRBCT microarray data
SBRCT gene expression data. 2318 genes, 63 training samples,
25 test samples.
One gene per row, one sample per column
Cancer classes are labelled 1,2,3,4 for c("EWS","RMS","NB","BL")
Files
- Training set gene expression:
khan.xtrain.txt
- Training set class labels:
khan.ytrain.txt
- Test set gene expression:
khan.xtest.txt
- Test set class labels :
khan.xtest.txt
diagonal LDA
p652 or ESL CN
The original text claims that
Here the diagonal LDA classifier yielded five misclassification errors for the 20 test samples.
As you can see in the above frequency table, there are 7 misclassification errors among 20 test samples (the NA samples are excluded), roughly the same performance.
Error curves (Fig. 18.4 top)
Roughly reproduce the original figure, the cv error might be different since the division of folds.
Tips related to the plot. Cannot find the twiny
command in plot.jl
, although it does exist a twinx
command, which is a bonus-feature and is not described in the docs. JuliaPlots/Plots.jl#337
Then I resorted to the pyplot
package.