aaronwalsman/ltron

Issues with getting the Break and Make environment working

Opened this issue · 6 comments

Hi Aaron (and other contributors)!

Thanks for this repository. I saw your paper at ECCV last year and am trying to get the environment working to look into this problem. I tried running the environment through the codebase (ltron/gym/ltron_env_interface.py) and ran into a couple issues:

  • I can't seem to find any information on supermecha as a package and how to add it as a dependency. Could you please let me know if I am missing something?
  • Gym/Gymnasium? : In places the codebase uses gymnasium, in others gym, any recommendations on which one to stick with?

Would it be possible to have some information about if I'm missing something and how I might get the environment up and running?

Thanks!

Cheers,
Chirag

Hello! So sorry about the delay, I have been on vacation for the past month. I am currently updating the repo for continuing/future work, but you are right some of the new dependencies have not been released yet. However, if you roll back to the v1.0.0 branch, this is the one associated with the ICCV paper and should work without the new dependencies (supermecha, etc.). The new version will have a slightly refined interface, but is not yet ready for public consumption. I should probably be developing that in a separate (non-main) branch, but I apologize for the sloppiness on my end. I'm more than happy to help you get up and running with the previous version, and will be more responsive in the immediate future now that I'm back! Sorry again for the delay!

Thanks Aaron! My turn to apologize for the delay from being on vacation :) I'll try the v1.0.0 branch and report back. (Out of curiosity, I'm assuming supermecha is something you're developing yourself right?)

Yeah, supermecha is a library for building environments by assembling multiple "environment components" together. The v1.0.0 branch does this too, but it's incorporated into the LTRON library. The idea is that you can mix-and-match different components to make environments with different features, for example add a depth rendering component if you need a depth-based version of the env, or add additional action space components to include different ways of interacting with the environment.

In all honesty though, I don't think this approach is going to survive long term. It's one of those ideas that seemed like a cool way to keep the environment flexible, but ended up being more trouble than it's worth. The hope was that when you need some new piece of functionality, you just make a quick component for it and then you can freely mix it into whichever env needed it. In reality, authoring new components is kind of tricky because they often require connections to the state of other components, so they're not as isolated as you would like. It's also not been easy to just mix new components into existing envs because there are tricky ordering issues you need to get right (make sure any component generating an observation gets evaluated AFTER any component with an action space that might affect the result of computing the observation). Long story short, it's just a little bit too cumbersome and the plug-and-play hopes never quite materialized.

The next release may still use the component structure because of inertia, but I wouldn't recommend this approach to others starting new projects, and probably won't use it again myself.

Also, as this is still in development, please let me know if there's anything I can help you with to get it set up. This is something I'm actively working on, so feedback is always appreciated.

Hi Aaron,

Apologies for the delay, finally got back from vacation, and then a conference, today and looked into this again.

  • On modularity. Yeah I hear you, making things long-term modular is a pain. Your description did remind me of /psi though (https://github.com/microsoft/psi). I was involved in the alpha testing, and the direction they went finally to create a marketplace for components did work decently. Although, they did have n!=1 developers (my previous job was indeed building a similar platform, and I was also the sole developer, so I understand the challenge in complexity). Nevertheless, I thought you might find the architecture design an interesting study. The other option I had considered for similar platforms was supporting a microservice architecture, not sure if that's in your plans, but it could be cool for people to wrap up their modules as a microservice and build a community around this task that way. Just a thought, happy to chat more if helpful.
  • On getting 1.0.0 running. So I finally tried a few things. The only env with a main over on 1.0.0 is interactive_reassembly_env. Running it fails at import because it seems like the break_and_make_env method is commented out. The full error is as follows:
➜ python -m ltron.gym.envs.interactive_reassembly_env                                                 
Traceback (most recent call last):
  File "/home/craman/miniconda3/envs/ltron2/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/craman/miniconda3/envs/ltron2/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/craman/projects/ltron/ltron/gym/envs/interactive_reassembly_env.py", line 17, in <module>
    from ltron.gym.envs.break_and_make_env import (
ImportError: cannot import name 'break_and_make_env' from 'ltron.gym.envs.break_and_make_env' (/home/craman/projects/ltron/ltron/gym/envs/break_and_make_env.py)

Any pointers on how to get the reassembly env running?


Perhaps it's prudent to ask some of the more direct questions. Ultimately, I'm trying to load the bricks and snap masks and be able to have an agent assemble it. For this I need to understand the data structures of the masks and check for collisions and test if an assembly is valid. I had assumed that getting the gym env running would lead to the pointers, but I started diving into the code a bit and arrived at ltron.geometry, which is where I believe you have the code for checking collisions and assembling bricks, right?

Concretely, I'd be grateful if you can point me in the right direction with the following:

  • I see both omr and omr-clean in the assets; which one is used and where? Also, the ldraw files in omr seem to have specific steps in composing the model from individual bricks, are they used?
  • So far, we tried to create a BrickScene and import one of the ldraw mpd files within omr. Then we accessed one of the instances in the scenes that has a list of snaps as properties. Each of these is of type SnapInstance. Looking at the code for that class, there seems to be a property BrickInstance. So it seems like this is a circular reference, since the brick instance has the snap sequence as a property as well? Are we looking in the right place to make sense of how to load and access a model, the bricks, and the snaps therein?
  • If we are, I'm wondering how to then pass this in to train a model. I couldn't find the relevant code for this yet.

No worries, thanks for getting back! I hadn't seen psi before, but I'll check it out thanks!

Sorry about the continued issues getting 1.0.0 running. I will look into the interactive_reassembly_env, and get back to you, not sure what's going on there. If you want to get a gym environment, you can run:

from ltron.gym.envs.break_and_make_env import BreakAndMakeEnvConfig, BreakAndMakeEnv
config = BreakAndMakeEnvConfig()
# set config parameters as necessary
env = BreakAndMakeEnv(config)

You can also look at the ltron-torch-eccv repo for examples for how to get started using pytorch.

On your other questions:

  1. Yes ltron.geometry.collisions.py handles collision checking. It's worth noting that this uses a kind of non-standard way of doing this by rendering depth maps of the object and the environment from different directions and checking if they overlap.
  2. omr-clean is the one used in the paper. omr is all the original models from the LDAW OMR without modifications. The issue is that the raw models are quite... raw. For example in LDRAW there are often multiple copies of the same brick with slightly different shapes, and to their credit the LDRAW authors have tried to be very faithful, so that a reproduction of a set from the 90s will have the era-appropriate part version in it. For our purposes we found these kinds of details are very hard to navigate because these small differences in shape are almost impossible to see from a reasonable distance. We also removed some bricks that seemed to be misformatted and broke our simulator. The other important thing in omr-clean is that all the models have been broken out into multiple parts corresponding to separate disconnected "islands" that exist in the original scene. With that said, this "cleanup" is also an ongoing effort. In our first paper, we were only able to train on relatively small models up to 8 bricks or so, but we continue to try to find ways to do cool things with the bigger models without succumbing to the surprising level of detail and complexity of them.
  3. If you are looking to just inspect models, bricks and render images without going through the gym interface, then BrickScene is a great place to start. It seems like you have the basic ideas right, but let me add some details here, which may be relevant. A BrickShape contains all the information about the shape and snap structure of a particular LEGO brick. A BrickInstance represents a copy of that shape in the scene with a particular 3D transform and color assignment. We built it this way so that you could have fifty copies of the same brick in your scene, potentially with different colors assigned to each one, and these would all reference the same BrickShape so you wouldn't have 50 copies of all that shape/snap data. It's worth noting that a BrickShape doesn't really "exist" in the scene, it's just a loaded chunk of data that describes the shape and snaps of a particular LEGO part. So with that in mind, each BrickShape has a list of Snap objects which represent the snap structure of that brick. Each Snap has various attributes describing what kind of connection point it is, but also a 3D transform describing it's local position relative to the BrickShape. In the same way that the BrickInstance represents a copy of the BrickShape, the SnapInstance represents a copy of a Snap that is actually tied to a particular BrickInstance. For convenience, the SnapInstance stores a reference to both the Snap that lives on the BrickShape that it's associated with (self.snap_style) and a reference to the BrickInstance that it is associated with. So in this sense it does have circular connections because the BrickInstance has a reference to the SnapInstanceSequence which has a reference to the SnapInstance which has a reference back to the BrickInstance, but Python handles this fine and it does not require circular imports.
  4. What kind of model are you trying to train? If you are trying to train a policy for the break-and-make task, then the BreakAndMake gym env will render everything and provide you with the correct observations to pass into your model. If you are trying to do some other task, you can use the BrickScene API to set that up, and do the necessary rendering (scene.color_render or scene.snap_render_snap_id for example, these methods are passed to the scene's render_environment which does the rendering). BrickScene is a kind of general-purpose API for manipulating the LEGO scenes, whereas the BreakAndMake gym environment is built just for that particular problem.

Anyway, I hope that is helpful! Sorry for the confusion and issues, we are actively working to improve this.