gpoesia/peano

Missing learning.py, Config files and exact commands

Opened this issue · 4 comments

From the README:

The main file to use to reproduce the Khan Academy experiments from the paper is `learning.py`, which will start an agent
to learn to solve problems using reinforcement learning and tactic induction. The config files and exact commands to run will come soon -
feel free to open an issue if you're interested in those and this hasn't been updated yet!

There is no learning.py file in the repo. And config files and exact commands are still missing.

cc: @gpoesia

@gpoesia Any chance you could provide these missing config files and commands to run so that I could try to train to induce tactics and reproduce the findings in the paper?

I am able to compile peano and load it in Python, but there seems to be no docs on how to use it in Python REPL. Running python tactics.py or python trainer.py throws errors like this:

Cannot find primary config 'tactics'. Check that it's in your config search path.

Config search path:
	provider=hydra, path=pkg://hydra.conf
	provider=main, path=file:///Users/mac/code/peano/learning/config
	provider=schema, path=structured://

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
Error executing job with overrides: []
Traceback (most recent call last):
  File "/Users/mac/code/peano/learning/trainer.py", line 315, in main
    trainer = TrainerAgent(cfg.trainer)
  File "/Users/mac/code/peano/learning/trainer.py", line 79, in __init__
    self.accumulate = config.accumulate
omegaconf.errors.ConfigAttributeError: Key 'accumulate' is not in struct
    full_key: trainer.accumulate
    object_type=dict

Hi Nilesh,

Sorry for taking so long! I just came back from vacations. Just updated the README and the main config file. There was in fact an error in the README - the main script to run experiments is trainer.py, which was already there (in the learning directory, not learning.py which never existed). I also had configs (learning/config/trainer.yaml), now updated to match one run from the main experiment in the paper.

I'll leave the issue open for now since I'm re-running the experiment using the commit and config here just to make sure this indeed matches the paper (the repository also has some experimental changes I added after the submission, inducing tactics with loops in them, so just confirming there was no regression. If so I'll just revert back to an earlier commit). It's running now and seems to be making progress as I'd expect, so I'll let you know soon!

I also added some instructions to the README that you can use to get a concrete sense of the action space, if that's of interest.

Please let me know if you have any issues or questions!

Amazing! Thank you.

Will check out and try to reproduce this. My interest was in taking a similar DSL-based approach for the ARC-AGI prize but your new paper is also very interesting. :)

BTW, you must have heard about DeepMind's AlphaProof that scored 28 points at IMO problems (thought it took 3 days): https://deepmind.google/discover/blog/ai-solves-imo-problems-at-silver-medal-level/

Exciting times! :)