Mismatched classification accuracy
besaman opened this issue · 6 comments
Hi,
regarding the classification task, I have two questions.
- You reported an average accuracy of 74.0% on the selected UEA datasets, while I can only get 71.4%. Can you help to achieve a similar performance as yours?
- Why don't you have a validation split? How do you assess your model? Do you try to get the best performance on the test set?
Thanks for your reply.
-
I understand that time series data are sensitive because it is small. But do you think the difference should be that high? especially that it is the average of 10 datasets.
-
I understand your point but its validity is questionable. I would suggest that you adjust the code to include a validation set to maintain accurate settings, especially with having trouble reproducing the results anyway.
Because there are some datasets with small sample sizes, the task itself will have some randomness. Additionally, apart from the learning rate, the settings of patch size and stride will also affect the final results. We will update the script files soon.
We will update the script files soon.
When I'm expected to find these updates?
Recently, as the year-end approaches, our business has been quite busy, so we haven't had the chance to update. You can search for parameters following the example below (SelfRegulationSCP1) to obtain results that meet the accuracy requirements of the article.
for lr in 0.002 0.001 0.0005 0.0001
do
for patch in 16 8 4 2 1
do
for stride in 16 8 4 2 1
do
python src/main.py \
--output_dir experiments \
--comment "classification from Scratch" \
--name SelfRegulationSCP1 \
--records_file Classification_records.xls \
--data_dir ./datasets/SelfRegulationSCP1 \
--data_class tsra \
--pattern TRAIN \
--val_pattern TEST \
--epochs 50 \
--lr $lr \
--patch_size $patch \
--stride $stride \
--optimizer RAdam \
--d_model 768 \
--pos_encoding learnable \
--task classification \
--key_metric accuracy
done
done
done
Hi, I tried to reproduce the results for classification and most scripts your provided work well except for JapaneseVowels. The reported results is 98.6 but I can get 82.4. The discrepancy is quite big. Could you help take a look at this issue?