- Diff of F1 score as reward (using test dataset)
- Base channel C3 and C4
- Log with average of F1 score
- simple DQN
- episode terminated if either diff f1 < 0, steps 6 times or choose 9 chs
- ActorCritic
- Another classifier
- Another reward type
- Another rules
- train data = fit model
- validation = update weigth
- test = test