Running the base benchmark tasks on local: How to run specific tasks/category
asc0216 opened this issue · 5 comments
Could you share the content of the ps_script_log.txt
? You can find the log file in the Setup folder on the desktop of the running VM. You can access the VM at localhost:8006
if using a browser or localhost:3390
if using RDP. More information on how to troubleshoot the preparation step can be found here.
The expected output when run locally typically comes out as a reported success score logged or printed to terminal/console--one after each task (1 if successful, 0 if not or a value between 0 and 1 if the task relies on some similarity measure) and one after each group of tasks or after all tasks are done for an overall success rate. The list of scores/rewards are stored in a list that's appended to as the benchmark loops through tasks so it should be straightforward to save them out if you need it for your own purposes.
To run a subset of the tasks, all you have to do is create a new json file similar to those that exist under
/win-arena-container/client/evaluation_examples_windows
. The new json should mimic the other jsons that are there in that the keys should be the category/application of the tasks and the values should be a list of the task IDs for each one of the keys. For instance, you could make a copy of test_all.json
and edit that down to the subset of tasks you want by keeping the programs and IDs corresponding to your desired tasks.
This should have been addressed as per #24:
- Pull the latest changes from the main branch.
- Run
docker pull windowsarena/winarena:latest
. - Remove any existing content in
src/win-arena-container/vm/storage
. - Execute again
./run-local --prepare-image true
.
Closing for now. Let me know if you're still experiencing any issues at the preparation step.