window_sizw
Closed this issue · 8 comments
Hello, thank you very much for your multiple replies. I apologize for bothering you again. I have a question. Should the window_size in these three places (train.py and kitti. py) in the picture be the same? When the value is 2, it represents VO1, when the value is 3, it represents VO2, and when the value is 4, it represents VO3. May I ask if this is the understanding?
Yes, they are the same and you got it correctly.
In "kitti.py" you will read the data with "window_size" frames in each iteration. (https://github.com/aofrancani/TSformer-VO/blob/main/train.py#L223), and in the model's output, you will have "window_size -1" estimations because you have one pose estimation for each 2 consecutive frames.
The value "window_size=3" you see in "kitti.py" is just a default value if you don't mention it when reading the data...
Thank you very much for your reply. If it is not mentioned when reading data, the value "window_size=3" seen in "kitti. py" is only a default value, and we do not need to worry about it.
So just modify the window size=2, 3, and 4 in train. py to represent VO-1, 2, and 3, respectively.
But I only changed (window_size: 2) to (window_size: 3) in train. py, and the resulting error was quite large.
So I would like to ask if there are any other parameters that need to be modified accordingly when modifying the window_size value in train.py?
No, the window_size parameter is independent of the others. What I used to do was set the overlap to "window_size - 1", so that the larger the window, the more data I got to train (with redundancy in the batches, because from one video clip to the next only one frame has changed). So, the overlap between the windowed data might be the other parameter you are looking for...
I'm very sorry, I was so foolish that I reread the article and still don't quite understand how to implement it in the code.
Initially, train. py: window_size=2; Kitti. py: window_size=3, which means the overlapping frame rate is 2, representing VO1?
Then I conducted two experiments according to my understanding:
1: In train, window_size=3, and in kitti, window_size=4, indicating an overlapping frame rate of 3, i.e. VO2;
2: In train, window_size=4, and in kitti, window_size=5, indicating an overlapping frame rate of 4, i.e. VO3;
But the result is still not right.
Has my understanding gone wrong again? I hope to receive your guidance again! Thank you.
I'm sorry I didn't get it... What do you mean the result is not right? the expected size of your windowed data or the final evaluation metrics after/during your training?
I will reproduce your code, and if no changes are made, the final error result will be similar to that in your paper.
But if I want to reproduce TSformer VO-2 and TSformer VO-3, how should I change it?
I made the changes according to this idea, and the final error was significant.
1: In train, window_size=3, and in kitti, window_size=4, indicating an overlapping frame rate of 3, i.e. VO2;
2: In train, window_size=4, and in kitti, window_size=5, indicating an overlapping frame rate of 4, i.e. VO3;
Simply put, I don't understand how to modify code. Where to train VO2 and VO3?谢谢
Ok, so you mean the final error after training everything...
So, the only thing you should edit is the "train.py", you don't need to worry about "kitty.py" because when we read the data we pass the parameter "args["window_size"]" as input to the dataloader.
- For VO-2: set "window_size=3 and overlap=2"
- For VO-3: set "window_size=4 and overlap=3"
I hope this helps!