Question about resume
YiHuang108 opened this issue · 6 comments
Sometimes, when I run the run_evaluation_Multi script again after the system breaks down, the script will continue to run from the middle. For example, when I run the 55th/74th script, the script will be resumed from the 30th script. This problem persists even if I directly change the number of processes.
For UniAD, There are a few things to pay attention to,
- TASK_NUM == len(TASK_LIST)==len(GPU_RANK_LIST)== how many processes
- split_xml.py run only allowed once. Otherwise, if you made the changes according to 3, it will be replaced.
- If one route crashed, you need manually subtract 1 from result.json and comment the route in xml.
I didn't do anything with the .xml file, and I only changed the count to the correct number in the .json file after the error occurred accidentally. Sometimes I can resume the process by rerunning the script; other times, it resume in the middle of the procedure.
When you change the count to the correct number in the .json file, you need to comment this crash route in xml.
len(route in xml) == count (in the .json file)
@YiHuang108 You may read the code in https://github.com/Thinklab-SJTU/Bench2Drive/blob/main/leaderboard/leaderboard/leaderboard_evaluator.py to understand the logic of resume code.
For UniAD, There are a few things to pay attention to,
- TASK_NUM == len(TASK_LIST)==len(GPU_RANK_LIST)== how many processes
- split_xml.py run only allowed once. Otherwise, if you made the changes according to 3, it will be replaced.
- If one route crashed, you need manually subtract 1 from result.json and comment the route in xml.
hello, I'm a little confused of the No.3 Point, could you please give a more detail description of it?
3. If one route crashed, you need manually subtract 1 from result.json and comment the route in xml.
btw, if for vad, it's still work?
Similar to #89