The results are so poor that can even be said to make wild predictions!
Opened this issue · 9 comments
Hi,Can you answer my questions?I don't know what went wrong. I wrote the extraction program corresponding to my backbone according to the eval_store file you gave. The trajectory extraction was correct, and the embs extraction was OK, but it was difficult to predict that the loss would stop at the 8th round, and the final result was a random prediction. Or even a thousand times that of backbone? What might be the problem?I'm really confused!
The backbone network metrics are as follows:
But the optimized metrics:
I would be very grateful if you could give me some advice and inspiration!
Hi, thanks for your interest for checking our method with other model.
I would suggest you to check 3 things.
- the embedding is the multi-modal final embedding for your decoder and the shape is [F, N, H] per data where F is modality, N is the number of vehicles of the data and H is your hidden dimension.
- for trajectory, you need to transform it to global coordinate then store it. The shape is [F, N, T, 2] per data where T is the trajectory length. To make sure data is ok, you can check some data with the error between your stored trajectory and those in the given p1_data (the trajectory comes from HiVT model so they must be small if correct). Also, when doing evaluation, check if the output trajectory and gt trajectory are in the same coordinate system.
- If all data is ok, try training with small lr to see whether it's model output error in early iterations or inappropriate optimization.
It seems that the problem is coordinate mismatch. You can see the code here https://github.com/opendilab/SmartRefine/blob/main/models/target_region.py#L120. Trajectories are transformed from agent to av's coordinate here since map is stored under av coordinate system following original HiVT.
My bad, try store your trajectory with each agent's own coordinate.
Thank you for your reply!But what exactly does each agent's own coordinate mean?Before switching to av coordinate system, the agent was originally in the global coordinate system. Do you mean to save the trajectory in av local coordinate system?
I saved the traces in av coordinates and ended up with metrics that weren't ridiculous in the hundreds, but the optimized results weren't nearly as good as backbone's.The backbone's metrics:fde0.8 , ade0.6.
But smartref is:
The metrics got worse instead of better. I changed the learning rate and it didn't change much. What caused this? How to solve it?
don't use av coordinate. Each car can have its own local coordinate. Normally, there are only two coordinate systems: global and local. HiVT uses av as an intermediate coordinate system and transformation acts as global to av, then av to local.
Hello, it took me so many days to get back to you due to the high time cost of the code.
The model I optimized using our approach, backbone itself, has two stages: prediction followed by refinement. The features at the first prediction are embedded as m_embs, then m_embs is decoded to generate trajectory traj, and then a series of operations are carried out to refine the generated features into n_embs, and n_embs is decoded to generate trajectory refinement traj_refine.The output of the final trajectory is traj+refine.
So I did the following:
1、 Save the refined trajectory and n_embs and refine them using our method, and the result is as follows:
2、Save the unrefined trajectory and m_embs and refine them using our method, and the result is as follows:
3、 the refined trajectory is saved, the feature embeddings are added by weight and a (m_embs) +(1-a)(n_embs), and then refined using our method, the result is as follows.
All three methods obtain bad results, even all of them are degraded in accuracy. Our method is supposed to be universal, and the effect of using it is to increase accuracy, but now it has decreased accuracy. How to solve this problem? How can I improve the accuracy of my backbone with our approach?This is my backbonehttps://github.com/XiaolongTang23/HPNet/blob/main/HPNet-Argoverse/modules/backbone.py
Looking forward to your reply and thank you!
The results seemed weird. I suggest you try 3 things:
- compare your stored trajectories with our p1 data to see if there is a mismatch, then compare it to your data's ground truth, to see the initial metric.
- try to reduce the refine number to 1 for training and see how the error changes.
- if possible, try the backbone's refinement (like the corresponding model architecture and data) in a multiple-iteration way and see if the results can be further improved by refinement. This repo is based on HiVT(cvpr 22) and it may be less effective than HPNet(cvpr 24).
Ade before using our method is 0.647, ade after using our refinement method is 0.655, and other indicators are also reduced by about 0.01. I tried no ref +ours and backbone+ours, and the indicators are all reduced by about 0.02. Different embedding ratios were also tried. I extracted the features as follows:
where K is modality, N is the number of vehicles of the data and D is hidden dimension, H is number of historical steps,F is number of future steps.Is there anything wrong with my embs? If embs is OK what parameters should I try to change if I want to improve accuracy? Such as lr, local radius?
I changed the refinement method of HPNet itself to our method, which will still reduce the accuracy.I think our refinement method is awesome, and I really want to apply it to the algorithm, looking forward to your answer.
Because the accuracy is always decreasing, I have the following questions:
After changing refine_num to 1, I see that the metrics you provide after the first round of hivt validation are the metrics that hivt outputs. Is it normal for my input metrics to differ significantly from the metrics that backbone outputs from the first validation round? Or should smartrefine start refining the descent based on the output metrics.
(2) For a module that already has refinement, how should the trajectory output embs and the refined embs be fused?(For example, what is the feature fusion strategy of QCNet) Linear addition feels problematic.
(3) The agent coordinate system is the coordinate system with the origin of the last frame of the history, but the p1data in the code seems not to have been transformed, and only converted to the av coordinate system when refining.
(4) The adaptive threshold q in the paper is not in the code, which uses a fixed value of 5, and most of the improvement results in your paper are in the case of non-fixed iteration rounds, whether this affects the reproduction accuracy.