State trace generation with a goal
e-cal opened this issue ยท 19 comments
- Need to guarantee each trace is correct and reaches the goal
- Traces need to be different (or give an warning that only the first n are unique if an arbitrary number of unique traces is not possible for a problem)
- Goal needs to be stored somewhere
Primary Objectives
- Given domain/problem, compute the one plan that achieves the goal inside of there.
- Run the vanilla sampler to get some candidate states k steps deep, then run a planner on a random subset of the fluents to get plans at least (k' steps long).
States need to be unique, or just the sequence of actions? Curious on what goes into point 2.
Could the goal appear as a partial state in the final token? (no action, just like the existing one for SLAF).
Part 2 just means the sequence of actions needs to be different in each trace (which implies the states would be different as well I suppose, but ARMS only cares about the actions).
I think that depends on how we are doing the tokens for ARMS. It seems like in the paper they assume no state information in the observations whatsoever, so I'm not sure if we want another custom token with no state information, or to reuse the SLAF one with all observations empty. We could also do something where the first and last observations use different tokens from the rest?
Hrmz...well different sequences of actions could have the same state appear, for sure. If it's not about deterministic agent behaviour (like what Eileen was implementing), then I think that should be fine.
Pretty sure that partial states are still on the table for ARMS. Check out Tbl 2 in the paper -- not a lot of state information, but it's there. Probably can get away with using the same (partially observable) observation token as SLAF.
@TathagataChakraborti : You probably want to keep an eye on the progress of this issue.
I wonder if you can just use the TOP-K/Q planners for this? You can reduce the quality value and get bounded non-rational plans as well.
But those are all plans for the same goal, no? It's (yet) another way to generate multiple traces, but would see this as its own distinct technique.
Yeah generation of interesting goals itself is a whole new world ๐ผ but once you have it, I would imagine you can launch the diverse/top-k/top-q planners based on what kind of traces you are looking for.
Totally -- combining these things would be powerful.
So that gives us a clear need to separate the functionality. (1) Generate interesting goals. (2) Solve for them. Later on, we can replace (2) with "(2') Find many solutions to them."
Hi there,
I just have a few clarification questions about a couple points in this issue. Here's what I've fully implemented so far:
- A function in
VanillaSampling
that generates goals through random sampling by allowing you to specify the number of states you want, the number of steps deep you want the states to be, and the percentage of the total state you want each goal to be (i.e. a random 70% of the total fluents for a partial state). - Functionality for any
Generator
to generate a plan given either raw PDDL files or a problem ID. - Functionality for any
Generator
to change the goal and generate a plan based on that. - A new
GoalTracesSampling Generator
that allows you to generate traces based on its (either custom or default) goal.
Concerning point # 2 - the one that states that traces need to be unique. What is the best way to go about handling that?
- Is there any way for me to retrieve the number of possible plans given a problem/domain? Or...
- Should I take the approach I took with the traces and only attempt to find a unique plan for some constant amount of time, i.e. 30 seconds? And then, if a unique plan wasn't found by then, give the user a warning that the rest of the traces are duplicates?
I'm also a bit confused about primary objective # 2:
- From what I understand, you want to find plans using the randomly sampled goals from
VanillaSampling
, and the lengths of these plans should be greater than the number of states deep you specified earlier (is that correct? Or what does k' mean here?). - What is this function going to be used for, and where should it go?
Thanks for the clarification :)
Re: Duplicates
I suspect just keep tossing them and trying again until the time-bound is hit. Hoping this is rare. You can take the hash of the string that concatenates the ground action names as a plan ID.
Re: Sampling new goal
We'd like to be able to toss out bad goals as well, and try again. The new plans will likely be shorter than the random trace length (and certainly will be if we use an optimal planner). Part of what I wrote today might help contextualize things (it's written as if we're all done, since we'll be submitting the paper in Nov):
You can ignore 2(a) for now.
Got it. I made two separate classes - one for what @e-cal needs (multiple traces from a single goal) and one for what you described (multiple traces with multiple (randomly generated) goals, as I think their functionalities/parameters are different enough to warrant that.
How do you want me to enforce 2(b)? There's many ways to do it, but I was thinking of something like...
- Only return goals whose plans are at least some specified length x (where x <= k). If after some constant length of time no such plans are found for a goal, either an exception is raised, or the algorithm tries again using a newly generated goal with larger value of k (deeper states -> likely more complex goals/longer plans?).
How are you generating multiple traces with a single goal? Would need a diverse planner for that...
I think you're pretty close on 2(b), but instead of randomly sampling until the selected goal is stumbled on, just try to plan for it. If the plan is >= x, then keep it. Otherwise, try again.
- Basically, the user supplies how many traces they want (let's say x) and the single goal they want to achieve. Then I attempt to generate x unique plans (keeping track of plans in a set). If after a constant length of time no more unique plans are generated (i.e. after 30 seconds no plan is generated that isn't already in the set), then I give the user a warning telling them the first n traces are unique and the rest are duplicates. Obviously not the most accurate way, but I figured I just needed to whip up something quick for ARMS. (Is there any way I could use a diverse planner/is there some other method you want me to use here?)
- Sounds good!
instead of randomly sampling until the selected goal is stumbled on, just try to plan for it
Just realized - that's what I meant originally, I was going to plan for each goal :)
Otherwise, try again.
I assume you mean "try using another goal" and not "try generating another longer plan for the same goal"?
Then I attempt to generate x unique plans (keeping track of plans in a set).
Generation just done by random rollouts? I worry that this will be extremely unlikely to stumble upon the right sequence.
Before any plans are generated, the user can change the goal to whatever they want. This creates new PDDL files to send to the planner. So, the plans generated do correspond to the goal, and once traces are built off those plans, I checked to make sure that the final states corresponds to the goal supplied, and they do! :)
Ah, so one plan / goal? That's entirely doable. I thought you needed a bunch of plans for the same goal.
This PR is just about good to go - I only have a couple final clarifications. In the goal-sampling
function, do you want all the goals generated to stem from the same initial state? This often resulted in goals/plans that were very similar. To add some diversity, when I found a complex enough goal I set that as the initial state before finding the next one. (I feel like I remember you suggesting this at one point, but I couldn't find it in writing, so I'm just confirming here).
Also, if the k value provided is not large enough with respect to the given complexity of the goal (i.e. the user wants goals that take 5 steps to reach taken from the last state of 5-step long traces), the program can get stuck in an infinite loop (never being able to find complex enough goals to satisfy that requirement). I made an iteration-based solution where if the program fails to find a goal 10 times in row, it uses a new trace 1 step deeper. I could also easily use a timing-based solution here, just let me know which you prefer.
Hrmz...on one hand, a breadth-first search to depth k would mean all subset-driven goals would have optimal plans of length k (online planner would likely find longer). Problem is that almost all planning problems would have a branching factor making this infeasible.
@alisonparedes : Another goal/plan-sampling research problem...how to find "deep" states without resorting to BFS. Maybe greedy-based search that maximizes the heuristic distance from the initial state to there? Would be a bit awkward with the full states being used as a goal...
@beckydvn : I'm not sure I made that suggestion (to keep going deeper from one found), but it doesn't seem half bad -- nice! Call it enforced_hill_climbing_sampling
as an optional parameter, and default it to true. For the second point, half another parameter for the time limit, and just take the top (i.e., longest) num_plans
after the time runs out. If you find num_plans
versions of at least length k
, then you can stop early.