libgdx/ashley

Way to make Engine#removeAllEntities() "atomic" operation

Opened this issue · 27 comments

Hi guys, here is today problem...

Overall situation:

In some moment of my game life-cycle I need to reset world. By "reset world" I mean remove all entities from engine and then recreate game objects from scratch. Simple code may looks so:

void resetWorld() {
    engine.removeAllEntities();
    initiWorld();
}

void initWorld() {
    // Some initial entities like "hero" create here
}

void update(float delta) {
    engine.update(delta);
}

And consider all entity systems must survive through every reset and handle entity add/remove events correct. So now assume we have some sophisticate system that handles two types of entities at same time (like layer entities, and drawable entities that bound to layers). And when the system had notified about layer entity has been removed from the engine, system removes all drawable entities of that layer from the engine too.

Here I have to say that I use PooledEngine and PooledEntities that I obtain only from engine. So here finally the problem (case when pt.1 occurs outside of engine update):

  1. Someone calls resetWorld();
  2. All entities became deleted and engine notifies entity listeners about each deletion (Engine#notifying = true);
  3. Our render system had been notified about layer entity removed, and now it calls Engine#removeEntity(Entity) for each drawable entity (which actually already deleted).
  4. Inside Engine#removeEntity(Entity) creates new EntityOperation of remove type (because engine right now has notifying flag);
  5. Then runtime reaches initWorld() and here some new entities create from engine pool and get filled.
  6. Then game loop executes my update() method and here engine finally processes all delete operations from pt.3.
  7. Here we've got ugly bugs, some random entities get lost and deleted unexpectedly :(

All that may go even bad if you call resetWorld() during engine passing update stage. But the problem is I have no ability to check it, Engine#updating is private and has no accessor. My first question is should we add getter for it (same question about Engine#notifying)?

Now my temporary solution is to call extra engine.update(0) before initWorld() so all pending EntityOperrations will be gone. Which is not so good solution, because I cannot even imaging what may happen when I call Engine.update() inside Engine.update() (engine does not check that case...). And the other similar solution is to call initWorld() using Gdx.app.postRunnable() which does pretty the same, but it's not fits my game architecture, because there are some other things inside update() that should not be called in such state.

Perhaps the best thing that may help is extra check inside Engine#removeEntity(Entity) that will just skip entities which is not active in the system right now (Engine#entities doesn't contain it)?

Conclusion

I'm trying to find a way to remove all entities during one engine update cycle, avoiding any pending EntityOperations after that.

Thanks for read that long story and I'm kindly waiting for your thoughts :)

Inside Engine#removeEntity(Entity) creates new EntityOperation of remove type (because engine right now has notifying flag);

I do not think so, thanks to this check in Engine#removeEntity(Entity):

if(entity.scheduledForRemoval) {
    return;
}

Have you really encountered this issue? If so, a test case would be very useful. I'm interested because I have also experienced issues with the delaying of add/remove operations. The only solution that seems to be able to work is snapshotting the entity arrays, which requires heavy refactoring of the Engine class.

Sorry for delay.
Yeah, it's for sure. The problem is here:

public void removeAllEntities() {
    if (updating || notifying) {
        for(Entity entity: entities) {
            entity.scheduledForRemoval = true;
        }
        EntityOperation operation = entityOperationPool.obtain();
        operation.type = EntityOperation.Type.RemoveAll;
        entityOperations.add(operation);
    }
    else {
        while(entities.size > 0) {
            removeEntity(entities.first());
        }
    }
}

When you call this method outside of updating or notifing, it will just literately remove entities and notify listeners in every iteration, and somewhere inside this loop entity deletion happens during notifying. So, as you can see, there are no any entities with scheduledForRemoval flag.

Thanks, I created a test case for this:

final Engine engine = new Engine();
final Entity parent = new Entity();
final Entity child = new Entity();

// This system is needed to force processing of entity operations
engine.addSystem(new EntitySystem() {
});

engine.addEntity(parent);
engine.addEntity(child);

engine.addEntityListener(new EntityListener() {
    @Override
    public void entityRemoved(Entity entity) {
        if (entity == parent) {
            engine.removeEntity(child);
        }
    }

    @Override
    public void entityAdded(Entity entity) {
    }
});
engine.removeAllEntities();
engine.addEntity(child);
engine.update(0f);
assertTrue(engine.getEntities().contains(child, true));

It might be necessary to write a few more tests to cover all possible situations, but this is at least a good start.

This is basically the same test case, but modified to use PooledEngine.

final PooledEngine engine = new PooledEngine();
final Entity parent = engine.createEntity();
final Entity child = engine.createEntity();

// This system is needed to force processing of entity operations
engine.addSystem(new EntitySystem() {
});

engine.addEntity(parent);
engine.addEntity(child);

engine.addEntityListener(new EntityListener() {
    @Override
    public void entityRemoved(Entity entity) {
        if (entity == parent) {
            engine.removeEntity(child);
        }
    }

    @Override
    public void entityAdded(Entity entity) {
    }
});

engine.removeAllEntities();
Entity another = engine.createEntity();
engine.addEntity(another);
engine.update(0f);

assertTrue(engine.getEntities().contains(another, true));

I'll look into solving this, probably by demanding EntitySystem implementations to snapshot entity arrays if they don't want to be affected by removal while iterating.

Well, I think I've managed to implement this snapshotting. Check it out at my fork. However, this introduces breaking changes (it requires some extra care to be taken by custom EntitySystem implementations), so it might not be feasible. Also note that this builds upon the work I did to implement entity listener priority. Edit: To run this, you need to update the libgdx dependency to 1.6.2.

Wow, it's almost new engine :) A lot of changes and I think @saltares should look on it.
But I have a simple solution, we just need process pending operations in the end of every iteration (see diff).

Just because this is a narrow place, so after operations have been processed, our loop will remove all entities anyway.

I chose snapshotting because it results in more predictable behavior. It also simplifies Engine a lot. My problem with the operation delayment was that I had an EntityListener that wasn't notified before all systems had been updated.

SnapshotArray could be a solution but mind that it has a performance penalty. Have you tested performance with this implementation and how it compares to the current master of Ashley?

Benchmark results with snapshotting:

AshleyBenchmark.ashleyMediumTest: [measured 20 out of 40 rounds, threads: 1 (sequential)]
 round: 0.01 [+- 0.00], round.block: 0.00 [+- 0.00], round.gc: 0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00, time.total: 0.63, time.warmup: 0.35, time.bench: 0.28
AshleyBenchmark.ashleySmallTest: [measured 20 out of 40 rounds, threads: 1 (sequential)]
 round: 0.01 [+- 0.00], round.block: 0.00 [+- 0.00], round.gc: 0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00, time.total: 0.30, time.warmup: 0.15, time.bench: 0.14
AshleyBenchmark.ashleyBigTest: [measured 20 out of 40 rounds, threads: 1 (sequential)]
 round: 0.04 [+- 0.00], round.block: 0.00 [+- 0.00], round.gc: 0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00, time.total: 1.73, time.warmup: 0.87, time.bench: 0.86

Benchmark results without snapshotting:

AshleyBenchmark.ashleyMediumTest: [measured 20 out of 40 rounds, threads: 1 (sequential)]
 round: 0.02 [+- 0.00], round.block: 0.00 [+- 0.00], round.gc: 0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00, time.total: 0.91, time.warmup: 0.49, time.bench: 0.42
AshleyBenchmark.ashleySmallTest: [measured 20 out of 40 rounds, threads: 1 (sequential)]
 round: 0.01 [+- 0.00], round.block: 0.00 [+- 0.00], round.gc: 0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00, time.total: 0.31, time.warmup: 0.17, time.bench: 0.14
AshleyBenchmark.ashleyBigTest: [measured 20 out of 40 rounds, threads: 1 (sequential)]
 round: 0.03 [+- 0.00], round.block: 0.00 [+- 0.00], round.gc: 0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00, time.total: 1.44, time.warmup: 0.74, time.bench: 0.69

I do not know if my entity listener priority patch affects the performance of the first benchmark, but it seems like there is no significant difference. (And SnapshotArray only performs a copy when the backing array is to be modified, right?)

Edit: These benchmark results are incorrect. Check the comments below.

Could you try to perform the test with more entities? Like an order of magnitude or two more.

Huge test (1.000.000 entities) with snapshotting:

AshleyBenchmark.ashleyHugeTest: [measured 20 out of 40 rounds, threads: 1 (sequential)]
 round: 6.44 [+- 0.71], round.block: 0.00 [+- 0.00], round.gc: 0.00 [+- 0.00], GC.calls: 4, GC.time: 3.85, time.total: 242.09, time.warmup: 113.27, time.bench: 128.82

Huge test without snapshotting:

AshleyBenchmark.ashleyHugeTest: [measured 20 out of 40 rounds, threads: 1 (sequential)]
 round: 6.17 [+- 0.20], round.block: 0.00 [+- 0.00], round.gc: 0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00, time.total: 238.31, time.warmup: 114.91, time.bench: 123.40

Note that the benchmarks does not add entities while updating, so there is no need for SnapshotArray to copy anything. I put the new benchmark in my repo.

Edit: These benchmark results are incorrect. Check the comments below.

I then think we need to add/remove some entities while updating. That's what snapshot solves and that is what penalizes performance...

I think we might have to solve this issue without adding a lot more overhead if possible. Ashley's performance is not brilliant as is right now.

Actually, I don't think the snapshotting will be a major problem, as snapshots are recycled. The only thing that could kill the performance is nesting begin()/end() while adding/removing entities, but that is not a correct thing to do. Do you know where the bottlenecks in performance are located?
Edit: It will actually do a System.arraycopy when recycling snapshots, forgot about that one. I don't think it's reasonable to delay modification of snapshots 'til they are recycled, as it is already complex enough as-is.

Whoops. I just noticed the benchmarks were run against the same version of ashley, 1.1.0. Why doesn't the benchmark project reference the core, but rather an artifact in the Maven repository?

I fixed this in the build-fix branch of my repo - could you review and merge this manually, @saltares?

So, here are the new benchmark results:
With snapshotting:

AshleyBenchmark.ashleyMediumTest: [measured 20 out of 40 rounds, threads: 1 (sequential)]
 round: 0.01 [+- 0.00], round.block: 0.00 [+- 0.00], round.gc: 0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00, time.total: 0.59, time.warmup: 0.31, time.bench: 0.28
AshleyBenchmark.ashleySmallTest: [measured 20 out of 40 rounds, threads: 1 (sequential)]
 round: 0.01 [+- 0.00], round.block: 0.00 [+- 0.00], round.gc: 0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00, time.total: 0.28, time.warmup: 0.15, time.bench: 0.14
AshleyBenchmark.ashleyHugeTest: [measured 20 out of 40 rounds, threads: 1 (sequential)]
 round: 6.02 [+- 0.17], round.block: 0.00 [+- 0.00], round.gc: 0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00, time.total: 236.38, time.warmup: 115.97, time.bench: 120.41
AshleyBenchmark.ashleyBigTest: [measured 20 out of 40 rounds, threads: 1 (sequential)]
 round: 0.04 [+- 0.00], round.block: 0.00 [+- 0.00], round.gc: 0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00, time.total: 1.46, time.warmup: 0.75, time.bench: 0.71

Without snapshotting:

AshleyBenchmark.ashleyMediumTest: [measured 20 out of 40 rounds, threads: 1 (sequential)]
 round: 0.01 [+- 0.00], round.block: 0.00 [+- 0.00], round.gc: 0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00, time.total: 0.70, time.warmup: 0.40, time.bench: 0.30
AshleyBenchmark.ashleySmallTest: [measured 20 out of 40 rounds, threads: 1 (sequential)]
 round: 0.01 [+- 0.00], round.block: 0.00 [+- 0.00], round.gc: 0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00, time.total: 0.29, time.warmup: 0.15, time.bench: 0.14
AshleyBenchmark.ashleyHugeTest: [measured 20 out of 40 rounds, threads: 1 (sequential)]
 round: 6.51 [+- 0.04], round.block: 0.00 [+- 0.00], round.gc: 0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00, time.total: 253.65, time.warmup: 123.36, time.bench: 130.28
AshleyBenchmark.ashleyBigTest: [measured 20 out of 40 rounds, threads: 1 (sequential)]
 round: 0.04 [+- 0.00], round.block: 0.00 [+- 0.00], round.gc: 0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00, time.total: 1.68, time.warmup: 0.85, time.bench: 0.83

So I guess these benchmarks are irrelevant when it comes to snapshotting performance. Edit: I actually think the slightly better performance of the first benchmark is caused by the usage of SnapshotArray, as it uses direct array access rather than method calls.

Benchmarks in the millisecond range are useless. The only one that counta is the huge test. Even then one needs to take into account vm warm up etc. Benchmark suite should use the JHM benchmark harness.

As for deletetion of entities: Ashley needs to decide if that should happen immediately or deferred. If the former, then only snapshoting is a working approach. If the latter, then there's a few things one can do. Simplest solution for the later: keep track if you are in engine.update(), in which case removeEntity will defer until the end of the update() call (last statements in update() will delete entities). If you are not in engine update or within the delete statements of update(), you can remove entities immediately.

I think @metaphore is right on the money with his patch. Killing entities during an update is less predictable than killing them at the end of an update. E.g. what if you had a box2d physics entitysystem. Within a collision callback, you remove one or more entities -> boom goes box2d. It's a lot safer to do removal in one specific place when no system is doing anything anymore.

Is it possible to notify EntityListener when addEntity()/removeEntity() is called, but delay the operation? This would solve my problem. The current delay mechanism could at least do with some improvement; possibly a rewrite from scratch. Edit: Right now, a couple of situations can occur where incorrect behaviour results. This is caused by conflicting operations being queued, which is not checked at all.

One solution to this problem could be to actually make operations "atomic" - maybe by using a kind of double-buffering. I have written an own ECS implementation that uses Bag to store components. There are two copies of each Bag; the "current" components and "next" components. After each frame, the "next" components are copied over to the "current" components, and then notifications are carried out. Entities are also stored in a Bag, with bitmasks for "current" and "next" entities. While this is a possible solution, implementing it for Ashley requires heavy refactoring, and lots of breaking changes. Any thoughts on this idea?

I fixed this in the build-fix branch of my repo - could you review and merge this manually, @saltares?

This was fixed in a6fcfd0, thanks a lot.

As for the current issue, sorry for taking so long to take a look at this.

This was fixed in a6fcfd0, thanks a lot.

No; it was fixed in #168. :P

As for the current issue, sorry for taking so long to take a look at this.

Are you keeping the concept of "operations" or have you got other plans in mind?

I really think Ashley should introduce the concept of an index for entities, like I described in #171, and maybe even refactor component storage, as proposed in #87. But the current API makes things very hard to change without breaking stuff.

You could take a look at artemis-odb or retinazer ;)

No; it was fixed in #168. :P

My bad! :-(.

I really wanna make Ashley better for everyone, including it being networking friendly. Right now it's a bit hard for me to spend enough time on it but hopefully that'll change in the near future.

@antag99 it seems your fork is gone... Any chance you could make that available again? Maybe there's a way of using Snapshotting while not having to put additional responsibility on derived EntitySystems.

@antag99 it seems your fork is gone... Any chance you could make that available again?

Nope; I've got the source code, but no git repo. It was more of a quick hack, anyway.

Maybe there's a way of using Snapshotting while not having to put additional responsibility on derived EntitySystems.

In retinazer, I implemented an EntitySet class (link), that offers snapshotting by using Entity arrays. These are rebuilt any time the underlying data changes, so no array is ever modified. The Iterator implementation simply retrieves the current array, which effectively dodges modification while iterating over entities. I honestly think we should throw out ImmutableArray in favour of a custom class. EDIT: I'm currently considering whether or not to delay modifcations; currently retinazer does, but in a different way than Ashley.

Another thing to be considered is when notifications take place. This can be delayed too, but I'm not sure whether it's a good idea or not.

Just like to weigh in here if I can as I am really opposed to this ticket's request... in my project I have a removal System that removes any entity with a Remove component, but also has a boolean in it that fires a removeAll(). This system is always the last system to run. I have found working extensively with Ashley that deferring entity removals to the end of a system cycle with a component as a flag is a really smart way to work as aside from the functional consistency you can cancel entity removals mid frame or even cancel the entire removeAll operation before it occurs.

Working in this way, if I have a load of stuff I want to action immediately after the removeAll has occured I have another System who's job is simply to process runnables that I have submitted at the start of each frame. this means if I want to create a tonne of entities etc (as I am loading a new level for example) straight after I have nuked everything, I will set the removeAll flag on my removal system then post a runnable to my ExecuteNextFrameSystem. This also gives you a thread safe way to submit stuff directly to your Ashley loop which was a big requirement for me.

Other than building these features directly into the core of Ashley and normalising this behaviour I'm not sure how you will get around it directly. Personally I hate the idea of an atomic removeAll() as that nukes the per frame state consistency that makes Ashley so freaking awesome, deferring all of that stuff to the end of the frame is really smart and is really neat to work with, so I would actively fight the idea of atomic removals mid-frame, I think it actually breaks one of Ashley's fundamental design paradigms.

IMO the removeAll method is a bit dangerous at the moment and should function like remove, in that once called it will register an event to be processed at the end of the current loop. Either that, or it should be a flag on engine that is checked at the end of each frame, that way you could decide mid-frame actually you don't want to nuke everything afterall and set it back to false should that edge case ever occur.

An example for why my Remove component is so good is server side I have entities that represent players, if they are in combat and disconnect their connection drops but their entity stays in game until they are out of combat... it happens (more frequently than you'd think) that they reconnect the exact same frame their entity is being removed, this means people were logging in and being paired back up to their entity just as it got removed, much bugs ensued. By using the Remove component as they are paired back up with their entity I can check for that component and remove it, thereby cancelling the remove and rescuing the entity. I don't know if such a case exists for a full removeAll call, but hopefully this demonstrates the robust nature of working in this way.

I think your efforts would be better suited to exploring why you have an atomic mid-frame removal operation in the first place to be honest as this seems to be a square peg round hole situation.

Liam

@carlislefox you misunderstand the initial problem. "Atomic" means not just instant removal of entities from the engine, but the approach in which engine should remove all entities right after update and have no pending operations remain for the next updates after it. Anyway the issue was about non update state of engine, where all entities should be removed instantly by design, but in illustrated case they weren't. Just consider that this is no longer the case, because Engine's code evolved since and handles removeAllEntities() properly now.