Question about cross-validation
ZiyuLin-DASHi opened this issue · 8 comments
Dear Dr. Azevedo,
Thanks for your sharing of your codes, they are of great help!
It was wrriten in the paper that 25 random runs were launched for each test set. This was the outter 5-fold cross-validation, and test set was kept the same in a group of 25 random runs by setting a hyperparameter.
Howerver, I found in the code that validation set would be gone through only once in a run (as in fig1 from main_loop.py). In my opinion, it meant that not only test set was kept the same in 25 random runs, but also validation set.
In the wandb sweep logs, I noticed that the second number (which indicating validation set) of each ouput of an epoch was the same number which was "1" (as in fig2). This supported my opinon above.
I am wondering if I miss something important so I misunderstood your code and paper. Maybe the k-fold validation generator will shuffle the data so that each validation set would be random in each run.
I am looking forward to your reply.
Thanks in advance.
In short, for a given test set, the validation set will keep the same. Is that true?
Hi! Thanks for the interest in our work.
If I understood your question well, yes, you are right: for each test set, there's only one train set and one validation set.
It is still truth that we have an outer 5-fold cross-validation - we have 5 independent (ie, mutually exclusive) test sets in each run. The test set is maintained to evaluate the final performance after training. For training (for each independent test set), rather than doing another (inner) cross validation, we decided to go only with a single train/validation sets, due to computational limits. The 25 runs correspond to 25 different sets of hyperparameters, trained on the train set, and evaluated on the validation set. The final model to be evaluated on the test set is the one trained with the hyperparameters that give us the best performance in the validation set.
With regards to your second image, in which there's always the same number "1" to represent the inner fold, I admit it might be misleading; however, I just left it there in case me (or another person wanting to use the code) wanted to use a nested cross validation in the future, and thus have more than just one train/validation split for each test set.
I hope this makes sense, but please do let me know in case you need any other clarification.
Thanks so much for your clear and concise explanation. I truly appreciate your ability to articulate complex information so clearly.
Dear Dr. Azevedo,
Sorry to interrupt you again.
It was written in the paper that you stacked 3 Graph Network (GN) blocks.
However, in code of "Class PNANodeModel" from "model.py", I found that actually you stacked node model 3 times instead of meta layer 3 times.
I am wondering if I misunderstood your code or paper. If not, I am curious why you applied edge model only once and node model 3 times instead of edge model and node model 3 times
Thanks for your time reading my comments. Looking forward to your reply.
Hi! Thanks for your question, I'm happy to try to help.
I don't see where I stacked the node model 3 times though. In the class PNANodeModel
, line 212, you can see that the number of layers I stack depends on the variable run_cfg['nodemodel_layers']
.
Maybe you are referring to a specific configuration .yaml file? If that's the case, can you point me to the specific one you are talking about?
Thanks!
Thanks for your answer very much. Let me try to express my question again.
Variable run_cfg['nodemodel_layers']
is what i am referring to. In the class PNANodeModel
, line 212 , you stack PNAConv 3 times which formed the node_model
(line 328) of meta_layer
(line 325) in class SpatioTemporalModel
.
That means in meta_layer
(line 325), the node_model
is stacked 3 times inside, but the edge_model
is not stacked.
In my point of view, 3 GN block equals to 3 MetaLayer
, equals to 3 edge_model
-node_model
combinations. However, In the code, meta_layer
acutally equals to 1 edge_model
plus 3 node_model
.
Thanks for your time reading my questions.
Ok, I understand your question better now and I believe you are right. In the paper I made the following statement: "We stack 3 GN blocks, after each of which we apply an 1D batch normalisation over the node’s features and a ReLU activation". This is only true for when the GN Block is only composed by the node model; however, as you said, for the cases where I combine both the node and edge models (N+E), the paper's sentence is indeed technically incorrect because I stack the node model 3 times, and the edge model only once.
I apologise for this mistake in the paper's description. The variable name in the code is self explanatory as it says nodemodel_layers
, meaning the layers of the node model, not of the GN Block. Still, somehow this got written in the paper as "GN Blocks" and I didn't noticed the mistake.
Thanks for spotting this mistake in the paper's description, and I'm sorry in case this brought too much confusion to your analysis and wasted time debugging.
Thanks for your answear very much! Your explanation is very important to me!