wechaty/puppet

Refactoring the `'dirty'` event, `dirtyPayload()`, and `XXXPayloadDirty()` methods logic & spec

huan opened this issue · 27 comments

huan commented
There are only two hard things in Computer Science: cache invalidation and naming things.

-- Phil Karlton

https://martinfowler.com/bliki/TwoHardThings.html

Working PRs

The New Design

In this new design, we will use:

  1. 'dirty' event (added via wechaty/grpc#79)
  2. dirtyPayload() method (added via #114)
  3. XXXPayloadDirty() method (removed via #114)

With the logic:

  1. 'dirty' event will be emitted by the deepest puppet by calling the dirtyPayload() method:

    PuppetServiceClient(PuppetServiceServer(PuppetAbstract))).dirtyPayload() -> PuppetAbstract.emit('dirty', payload)

  2. Every Puppet listen to 'dirty' event with its onDirty() method, and invalid(remove) the cache of the payload with the specific id and type (register onDirty() callback to the dirty event will be done automatically by the PuppetAbstract)
  3. XXXPayloadDirty() should call dirtyPayload() first, then wait the 'dirty' event. After receiving the 'dirty' event, it can return because the payload should be dirtied by the listener (which should be registered by the Puppet itself)
    1. a timer should be added in case of the 'dirty' event was lost, and throw an Error. (5 seconds in the current code)

New Puppet API: onDirty()

  1. onDirty() will be addListener to the dirty event by the PuppetAbstract
  2. child puppets need to override it to clean their own payload cache/store (and do not forget to call super.onDirty()!)

Dirty a Payload Locally (in Puppet Provider)

How to dirty a payload (for puppet provider, locally):

  1. puppet.XXXPayloadDirty(id)
    1. future = await new Promise(resolve => puppet.once('dirty', resolve))
      1. check the dirty id/type match before resolve
      2. add a timer to reject the Promise if timeout
  2. puppet.dirtyPayload('XXX', id)
    1. setImmediate(() => puppet.emit('dirty', { type: 'XXX', id })) <- add this task (emit) to the end of event loop queue
  3. puppet.onDirty()
    2. clean the cache
  4. puppet.XXXPayloadDirty(id) <- future resolved
    1. await new Promise(setImmediate) <- wait current event loop task queue to be all executed

call 1, emit 2, listen 3, resolve 4

Dirty a Payload Remotely (with Puppet Service)

  1. 🖥️ Puppet Service
    1. puppetService.XXXPayloadDirty(id)
      1. future = await new Promise(resolve => puppet.once('dirty', resolve))
        1. check the dirty id/type match before resolve
        2. add a timer to reject the Promise if timeout
    2. puppetService.dirtyPayload('XXX', id)
    3. gRPC Service DirtyPayload
  2. ☁️ Puppet Server
    1. puppetServer.dirtyPayload('XXX', id)
    2. puppetServer.emit('dirty', { type: 'XXX', id })
    3. puppetServer.onDirty() listener: remove payload cache
    4. gRPC Service Event (Stream)
  3. 🖥️ Puppet Service
    1. puppetClient.emit('dirty', { type: 'XXX', id })
    2. puppetClient.onDirty() listener: remove payload cache
    3. puppetClient.XXXPayloadDirty(id) future resolved

Steps Mapping

Locally Remote Client Remote Server
1 1. i 2.i
2 1. ii 2.ii
3 3. ii 2.iii
4 3. iii 2.iv

Related issues

Breaking Change

This change will not affect the end-users.

However, the puppet provider developer needs to follow this new logic when they are using the new version of Puppet Abstract Class.

The Puppet Service will be affected too, so we recommend paying attention to the Wechaty versions to be matching on both the server and the client.

CC @wechaty/puppet @wechaty/grpc

huan commented
There are only two hard things in Computer Science: cache invalidation and naming things.

-- Phil Karlton

https://martinfowler.com/bliki/TwoHardThings.html

Start working on this Issue

huan commented

Done.

  1. puppetServer.dirtyPayload('XXX', id)
  2. puppetServer.emit('dirty', { type: 'XXX', id })
  3. puppetServer.onDirty() listener: remove payload cache
  4. gRPC Service Event (Stream)

Currently in step1,the server remove payload cache.Why make it so complex?

huan commented

Currently in step1, the server removes the payload cache. Why make it so complex?

This complexity is necessary because the Wechaty Puppet Service Client also holds a cache layer, which means that if the Server only removed the payload cache for itself, the Client will not get notified and still hold the outdated (dirty) payload and use it, causing the end user problems like the alias, topic can not be updated, etc.

The client call dirtyPayload method ,which means that the client can remove the cache.

dirtyPayload is request-reponse .If the server remove some payload cache,the server will emit dirty event to client.

huan commented

If the server removes some payload cache, the server will emit a dirty event to the client.

Yes, that's all the server needs to do: when it dirty any payload, emitting a dirty event will be enough.

Because the client can not know when they should remove a cache from the server, they must be notified by the server, with a dirty event. After the client received the dirty event, then they will be able to clean the cache, and then request the payload again from the server to get the latest data.

That's all our current design is for, thank you very much for pointing this out.

客户端请求payloadDirty的时候,就是客户端希望服务器端删除缓存,这个时候客户端请求成功之后就可以删除本地缓存了啊, 难道还需要服务器再发个event给客户端?是不是有点多此一举?

如果是服务器端自己根据逻辑判断需要删除缓存,那么服务器端会发送event给客户端的,我觉得0.x的设计是对的啊。

huan commented

Thanks for raising this discussion!

What you said is only partially true in the paimon use case because paimon is the most simple architecture in the wechaty ecosystem.

In a complicated architecture like WXWork or Pad
Local gateway:

Puppet WXWork -> Puppet Service Server -> gRPC -> Puppet Service Client -> Wechaty

This design is necessary for dealing with the dirty payloads to be correct.

puppetServer.dirtyPayload('XXX', id)
puppetServer.emit('dirty', { type: 'XXX', id })
puppetServer.onDirty() listener: remove payload cache
gRPC Service Event (Stream)

能分别介绍一下,在Grpc服务端干啥工作? 第三步和第四步分别是啥?Grpc里面没有onDirty接口啊?第四步是啥意思?这个改动我没看明白,尤其是对于Grpc服务端正确的做法。
谢谢!

huan commented

Hi @zpaimon

Thanks for asking!

This issue is written for the standard Wechaty Puppet Provider, so it has some Wechaty Puppet Abstract API implementation details, which might be confusing you because the Paimon is a pure gRPC service.

I think from Paimon's perspective, what you need to care about is only the below two gRPC methods (and safely ignore all the 第三步和第四步 & Grpc里面没有onDirty接口 & 第四步):

  1. rpc Event()
  2. rpc DirtyPayload()

You can read the detail of the above methods from the proto buffer definition at Wechaty gRPC Repo

So here's the answer to your question: "在Grpc服务端干啥工作": you need to implement the below two algorithms:

1. send EventResponse()to the rpc Event() stream

When the Paimon has any payload updated (like contact alias change, room topic change, room member change, etc.), it needs to send a new EventResponse() to the grpc Event() stream to the client, with the EventType is EVENT_TYPE_DIRTY:

message EventResponse {
  EventType type = 1;
  // TODO: Huan(202002) consider to use a PB Map?
  string payload = 2; // JSON.stringify({ ... })
}

Here's the EventResponse link and EventType link for the above definition.

2. react rpc DirtyPayload() call

This design is for helping the client to force re-sync the data from the server.

When Paimon received an rpc DirtyPayload() call, it needs to:

  1. delete the server cache of the id with the payload type first, then
  2. send an EventResponse to the rpc Event() stream with the same type and id to notify the client that the server has dirty-ed the payload.

I hope my explanation can help you start understanding this design. Please feel free to let me know if you have any more questions.

Thank you very much!

send an EventResponse to the rpc Event() stream with the same type and id to notify the client that the server has dirty-ed the payload.

I think paimon only has't the above logic and other things is ok!

Is the design compatible with 0.x API ?

If the client is 0.x and grpc server is 1.x,dead loop will happen.

Client(DirtyPayload)->Server(dirtyPayload and send Dirty event to client)->Client(DirtyPayload)........

Dead loop !

huan commented

You are welcome.

Glad to know that you made it clear that what the Paimon need to follow, and please feel free to let me know if you have other questions.

Is the design compatible with 0.x API ?

If the client is 0.x and grpc server is 1.x,dead loop will happen.

Client(DirtyPayload)->Server(dirtyPayload and send Dirty event to client)->Client(DirtyPayload)........

Dead loop !

huan commented

Dead loop !

Good catch: no, this design is not compatible with 0.x client 😢

It's a mistake because we have missed this problem when we were designing this v1 new logic.

I believe this is also related to

The solution for this problem might be:

  1. Use a workaround logic in the server, to prevent to emit the same dirty event in a period of time, as a workaround. I think it will be very easy to implement this by using a RxJS throttle operator.
  2. Make sure the user are upgrading to the Wechaty v1.x before they are using a v1.x puppet service.

Please feel free to use any other workarounds if you can find a better one.

And I think this problem should be posted in the website if the user need to upgrade to wechaty v1.x go use the new paimon.

So i think the GRPC protocol need to keep it simple logic.I suggest that the GRPC server don't send Dirty event to client when client call dirtyPayload which is request-reponse mode.

huan commented

So I think the GRPC protocol needs to keep it simple logic. I suggest that the GRPC server doesn't send the Dirty event to the client when the client call dirtyPayload which is request-reponse mode.

If the Wechaty user is using the wechaty-puppet-service direct connect to the Paimon service, then your suggestion would work.

But unfortunately, we can not follow your suggestion because what you suggested can not cover all the use cases.

When we have any middle layers between the Wechaty and the Puppet Provider, for example:

Wechaty -> Wechaty Puppet Service Client -> gRPC -> Wechaty Puppet Service Server -> Wechatyu Puppet Provider

Then we need a dirty event to be propagated through the above pipeline, to make sure all the payload cache in the path has been dropped, because the request-response model can only make sure the first and last node can be synced.

In this case, we need to emit the dirty payload and propagate it through the pipeline, so that all the layers will be updated correctly.

Wechaty -> Wechaty Puppet Service Client -> gRPC -> Wechaty Puppet Service Server -> Wechatyu Puppet Provider

In the above pipeline,gRPC can await the Wechaty Puppet Service Server to clear cache, then response(dirtyPayload response) to Wechaty Puppet Service Client that the payload cache is cleared.

huan commented

Unfortunately, we can not only await the Wechaty Puppet Service Server to clear the cache because this logic only clears the cache from one node, which is the server itself.

The Wechaty needs to make sure all the nodes in the pipeline have cleared the cache before it can continue reloading the payload again, to make sure there's no outdated cache in the pipeline so that it can get the newest data correctly.

That's the reason why a dirty is needed and should be propagated through the pipeline.

Each node in the pipeline will listen to the dirty event and do two jobs:

  1. clear the cache
  2. propagate the dirty event to the upper layer

After the dirty event has been propagated to the latest node (which is the Wechaty layer), then the Wechaty can know that all the outdated cache has been cleared.

I hope I have explained this problem clearly. Please think more about a complex architecture other than the Paimon case, and that will greatly help you to understand this dirty event design.

In other side,the design maybe trigger wechat risk control if infinite request for the fresh payload(Contact/Room)。

I think I might find the infinite loop problem you talked about in wechaty/puppet-wechat#177. I think it's a puppet issue so I post it here.

Let's take a look at this example and see how it goes.
bot.Contact.find({name: xxx}).phone(['+8612345678'])
Wechaty ContactMixin phone()

await this.wechaty.puppet.contactPhone(this.id, phoneList)
await this.wechaty.puppet.contactPayloadDirty(this.id)

Wechaty Puppet ContactMixin contactPayloadDirty()

await this.__dirtyPayloadAwait(
  PayloadType.Contact,
  id,
)

Puppet Server will emit a dirty event and resolve the above await. Everything seems alright so far, but here:
Wechaty PuppetMixin __setupPuppetEvents()

puppet.on('dirty', async ({ payloadType, payloadId }) => {
  try {
    switch (payloadType) {
      case PUPPET.types.Payload.RoomMember:
      case PUPPET.types.Payload.Contact:
        await (await this.Contact.find({ id: payloadId }))?.sync()
        break
...

Wechaty Contact sync()

await this.ready(true)

Wechaty Contact ready()

if (forceSync) {
  await this.wechaty.puppet.contactPayloadDirty(this.id)
}

Here we go, we go back to Wechaty Puppet ContactMixin contactPayloadDirty()

huan commented

Good catch!

It seems that the dead loop bug had been located.

Will investigate on it after back to my computer, and please feel free to create a PR to send your fix.

Thank you very much!

There are many ways to fix this, so we should come to some consensus first.

In my opinion, it's a bit odd for puppet-implementation to emit a dirty event when the dirty comes from the puppet-service. This dirty is completely different from the dirty event trigged by puppet-implementation. We can either remove it, or add a new event type for it. (like 'dirty-response').

If to separate these two kind of dirty event is not acceptable, we can also add a pool in onDirty handler, so that all dirty events started from __dirtyPayloadAwait will be dismissed.

Maybe we should move this discuss into a new issue since this one is closed long ago.

@huan is It means that the 'dirty' event depend on Implementation of the puppet?

I use puppet-wxwork and create a room ,when it is created,here is the error,then I add something like

      await bot.puppet.roomPayloadDirty(room.id)
      await bot.puppet.roomMemberPayloadDirty(room.id)

The error still exists,Is my calling method is not right?

14:06:14 INFO Bot createDingRoom() new room created: WechatifiedRoomImpl
14:06:19 WARN PuppetCacheMixin __dirtyPayloadAwait() timeout.
  The `dirty` event should be received but no one found.
  Learn more from https://github.com/wechaty/puppet/issues/158
  payloadType: Room(3)
  payloadId: R:10742655366665696
  error: Timeout after 5000 ms
  stack: GError: Timeout after 5000 ms
    at Function.from (file:///Users/choogoo/WeChatProjects/groupadmin/bot/node_modules/gerror/src/gerror/gerror.ts:69:17)
    at Timeout._onTimeout (file:///Users/choogoo/WeChatProjects/groupadmin/bot/node_modules/gerror/src/timeout-promise/timeout-promise.ts:26:36)
    at listOnTimeout (node:internal/timers:557:17)
    at processTimers (node:internal/timers:500:7)
14:06:22 WARN PuppetCacheMixin __dirtyPayloadAwait() timeout.
  The `dirty` event should be received but no one found.
  Learn more from https://github.com/wechaty/puppet/issues/158
  payloadType: Room(3)
  payloadId: R:10742655366665696
  error: Timeout after 5000 ms
  stack: GError: Timeout after 5000 ms
    at Function.from (file:///Users/choogoo/WeChatProjects/groupadmin/bot/node_modules/gerror/src/gerror/gerror.ts:69:17)
    at Timeout._onTimeout (file:///Users/choogoo/WeChatProjects/groupadmin/bot/node_modules/gerror/src/timeout-promise/timeout-promise.ts:26:36)
    at listOnTimeout (node:internal/timers:557:17)
    at processTimers (node:internal/timers:500:7)
14:06:27 WARN PuppetCacheMixin __dirtyPayloadAwait() timeout.
  The `dirty` event should be received but no one found.
  Learn more from https://github.com/wechaty/puppet/issues/158
  payloadType: RoomMember(4)
  payloadId: R:10742655366665696
  error: Timeout after 5000 ms
  stack: GError: Timeout after 5000 ms
    at Function.from (file:///Users/choogoo/WeChatProjects/groupadmin/bot/node_modules/gerror/src/gerror/gerror.ts:69:17)
    at Timeout._onTimeout (file:///Users/choogoo/WeChatProjects/groupadmin/bot/node_modules/gerror/src/timeout-promise/timeout-promise.ts:26:36)
    at listOnTimeout (node:internal/timers:557:17)
    at processTimers (node:internal/timers:500:7)
14:06:27 INFO Room room-7881302781913704 topic changed from room-7881302781913704 to room-7881302781913704 by 瓦力 
huan commented

Yes, it depends on the implementation.

The new Wechaty v1 will expect a dirty event in this case.

BTW: I think when using Wechaty, we should avoid calling the puppet directly as possible as we can.