Running the base benchmark tasks on local: How to run specific tasks/category

Question

Running the base benchmark tasks on local: How to run specific tasks/category

asc0216 opened this issue 6 months ago · 5 comments

I followed the readme to run the tasks using

./run-local.sh --start-client true

I see no change in the terminal atleast for ~1 hour:

Questions:

what is the expected output when running the benchmark?
how can I run a subset of the tasks?

Answer 1 · 2024-09-27T23:59:41.000Z

Could you share the content of the ps_script_log.txt? You can find the log file in the Setup folder on the desktop of the running VM. You can access the VM at localhost:8006 if using a browser or localhost:3390 if using RDP. More information on how to troubleshoot the preparation step can be found here.

Answer 2 · 2024-09-28T03:43:05.000Z

The expected output when run locally typically comes out as a reported success score logged or printed to terminal/console--one after each task (1 if successful, 0 if not or a value between 0 and 1 if the task relies on some similarity measure) and one after each group of tasks or after all tasks are done for an overall success rate. The list of scores/rewards are stored in a list that's appended to as the benchmark loops through tasks so it should be straightforward to save them out if you need it for your own purposes.

To run a subset of the tasks, all you have to do is create a new json file similar to those that exist under
/win-arena-container/client/evaluation_examples_windows. The new json should mimic the other jsons that are there in that the keys should be the category/application of the tasks and the values should be a list of the task IDs for each one of the keys. For instance, you could make a copy of test_all.json and edit that down to the subset of tasks you want by keeping the programs and IDs corresponding to your desired tasks.

Answer 3 · 2024-09-30T17:53:56.000Z

the ps_script_log.txt looks like the below, I see failures with installing LibreOffice and pip:

Answer 4 · 2024-09-30T19:13:46.000Z

This should have been addressed as per #24:

Pull the latest changes from the main branch.
Run docker pull windowsarena/winarena:latest.
Remove any existing content in src/win-arena-container/vm/storage.
Execute again ./run-local --prepare-image true.

Answer 5 · 2024-10-01T18:40:47.000Z

Closing for now. Let me know if you're still experiencing any issues at the preparation step.