Cluster mode (multi-core support)

Question

Cluster mode (multi-core support)

Closed this issue 3 years ago · 1 comments

PM2 has a built-in cluster mode, so I think I can focus on developing the cluster support with it in mind, instead of having to make a cluster master script.
Or maybe I can consider making a separate cluster master script anyway, but the main file lolisafe.js should still be a script that can be forked by PM2 (?).

Current state of the safe is still not suitable for clustering due to many implementation of in-memory states (such as file identifiers caching, statistics caching, album zip generation states, etc).
There are 2 solutions that I have in mind for this.
First is to make my own cluster master script and put all in-memory states there. Cluster workers will then communicate with master to get/set states. But we won't be able to use PM2's cluster mode with this, as PM2 will have to run the cluster master script instead.
Second is to use an external in-memory storage, such as Redis. This method may eliminate the need of having to make a cluster master script, as each workers will communicate directly with Redis. This solution may be preferable if I want to make use of PM2's built-in cluster mode.

I need to look into how to handle clamdjs scanner as well.
Like, will it be fine to create a new clamdjs scanner instance for each cluster worker, or should I create only one in a cluster master script?
I think the former should be fine, cause in the end all instances will communicate with a single clamd server anyway.

I'll also need to look into racing conditions when handling file names.
It's not an issue when file identifiers are cached, cause I've coded it to immediately lock the name once generated (so that the same name will not be used by future processes, even if the name hasn't been used to write file to the disk storage yet), but it's still an issue when that feature is turned off.
Solution for this is rather simple. I mean, I just need to do the same, as in locking the names by temporarily putting them in-memory as well (though as opposed to the former where all file names are cached, this will only temporarily cache names that will be used for new files).
But I'll also need to review other part of the codes around the theme of racing conditions.
There may be some things that I currently can't remember.

Anyway, this is a long term objective.
There is no immediate need for this yet, as safe.fiery.me is currently not too demanding to be in need of sharing loads to all CPU cores.
This is just a note to keep myself reminded of what I had in mind for this particular objective (and hopefully receive some insights from other users of this fork).

Answer 1 · 2019-10-12T09:44:46.000Z

I think the redis approach is the way to go in this case. Would make it cluster-agnostic so you can either create your own master script or use PM2 or any other really. And the support for redis is huge, basically the best tool for the job.

The master website for lolisafe has pretty heave usage and yet there's still no need to cluster because it works just fine, so as you mentioned this would be something nice to have in the long run. I'll look into it for master as well since the rewrite removed a lot of in-memory state, leading to an easier implementation of what you want to achieve here.