paed01/bpmn-engine

getState/recover size optimization

Closed this issue · 17 comments

Hi @paed01!

We are delighted with bpmn so far, but we have found a bump in the road with our use case… Our implementation lives in AWS Serverless, so after each await action we need to persist the state and recover it later on. The problem is that the workflow state is quite big (around 20Kb for a very simple workflow definition) and that takes time to transmit and store in the database.

We have been trying to clean the state up a little bit before serializing it, as it seems to have a lot of data that is internal to the engine (queues, brokers, etc.) and seems transient, but when we tried to recover it won’t work because it relies on that data being there rather than recreating it from scratch.

Can you think of any way to reduce the footprint of the workflow state that still works? We are more than happy to contribute with a PR if you are kind enough to provide us with some guidance :)

Cheers!

Since the engine can handle multiple definitions the source (BPMN) of each definition is also included. This isn't necessary if you only have one definition per instance (recommended). So you can remove source and then feed the engine with the correct source before recover/resume. Check this feature test for inspiration.

At Onify we actually use a smqp/Queue to push states and then have a separate consumer that persists the state. Better yet would be a proper broker, e.g. SNS/SQS. CQRS usually does the trick but it requires the solution to be idempotetent.

Is there any way to save only the current state, the current environment variables and start manually the engine in this middle state? In our case, we don't need to store the historical messages and we have only one definition per instance and we are using the same services and tasks for all definitions (we are only changing the order of the task or the variables, using the same source of tasks)

And by current state you mean only running activities?

Let me explain.

Imagine this case:

<?xml version="1.0" encoding="UTF-8"?>
<definitions xmlns="http://www.omg.org/spec/BPMN/20100524/MODEL" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <process id="theProcess" isExecutable="true">

        <startEvent id="theStart"/>
        <serviceTask id="action" implementation="${environment.services.action()}"/>
        <userTask id="waitAction"/>
        <endEvent id="theEnd" />

        <sequenceFlow id="toAction" sourceRef="theStart" targetRef="action" />
        <sequenceFlow id="toWait" sourceRef="action" targetRef="waitAction" />
        <sequenceFlow id="toEnd" sourceRef="waitAction" targetRef="theEnd" />

    </process>
</definitions>

We are launching the engine with a lambda function and the action does a call to an external service. In this case, we have to wait to the external service response (but we cannot let the lambda running), so we define a waitAction.
The waitAction stops the engine and save the state to recover it later (when the service answers through a SQS call). When the SQS call is received, the state is retrieved from the DB and the engine is restored with this state to continue and finish.

The main problem we have is when we are managing the state, the state is huge and the calls to save a retrieve the state from DB are slow and expensive in space.

In the end, we only need the reference of the sequenceFlow to continue the process and the variables stored until this moment. We have the same services and listeners for all different possible definitions we have in our system.

This is why we are wondering how to save a lighter state to continue the engine without having all of the information stored

You might consider skipping the engine all together and just use bpmn-elements Definition. It has a slimmer state and you should be able to trim it to a minimum. Though it requires some more work.

I have begun work with slimming the state. First off - the smqp broker.

As far as I can see it is reduced by about 50%.

@acarrasco, any luck with the state size?

@acarrasco, any luck with the state size?

We've had this issue parked at the bottom of the backlog for a while, but we will definitely give a try to the latest release to see if it gives a nice boost to our marshaling/unmarshaling times 😄

Thanks a lot for the work! 🤗

No worries, electronic backlogs tend to have a long tail.

paed01 commented

Even slimmer! - 7a9a0c6

Removed some process- and state properties not needed for recover.

Still too large @acarrasco?

Ah! forgot to tell you about the new setting disableTrackState. The setting name may not be the best but the effect is that element counters are ignored when getting state.

const engine = new Engine({
  name: 'state without counters',
  source,
  settings: {
    disableTrackState: true,
  },
});

We ended up using a custom state machine that was very minimal so it could serialize and initialize fast 😅

@acarrasco can I close this issue or do we keep it open as a reminder or for sentimental reasons?

De nada.