sysown/proxysql

Use weight=0 Servers as Backup if all other Nodes fail?

Opened this issue Β· 21 comments

Instead of ignoring servers with weight=0 would it be possible to use them if all other nodes of this hostgroup are down? This would be a very easy solution for sending all write traffic to only one node at a time.

Hi Thomas,
Thank you for the report.
This sounds like a reasonable feature request.
Hostgroup Manager already tries to self tune itself in some scenarios (see #530 and #531 for example): use server(s) with weight=0 when no other hosts seem alive seem another case of self-tuning.

Hi,

Was this implemented? Tried but doesn't seem to work.

proxysql> select * from runtime_mysql_servers;
+--------------+----------------+------+---------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| hostgroup_id | hostname | port | status | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
+--------------+----------------+------+---------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| 10 | 192.168.130.51 | 3306 | SHUNNED | 1 | 0 | 1000 | 0 | 0 | 0 | |
| 10 | 192.168.130.52 | 3306 | ONLINE | 0 | 0 | 1000 | 0 | 0 | 0 | |
+--------------+----------------+------+---------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
2 rows in set (0.00 sec)

Yet I'm not able to run queries...

mysql> select @@hostname;
ERROR 9001 (HY000): Max connect timeout reached while reaching hostgroup 10 after 10000ms
mysql>

Any news on this?

Reminder of our usecase: We want all writes to only go to one server and if this server is not available anymore, writes should not fail but go to the next available server. Or maybe there's another way to do this?

Not implemented yet.

Note about this feature:

  • it should not be considered as an HA solution
  • proxysql won't have a complete view of all the cluster, therefore a shunned node may not be really failed
  • this doesn't protect from network split
  • multiple proxies could end up with different writers
  • it is still questionable what to do with the previous writer : considering the lack of complete view of the cluster and is therefore unable to declare a node as failed without any reasonable doubt, maybe writes should be sent back to the previous writer

Because of all the above limitations, I am not sure this should implemented at all. Instead, ProxySQL should be pared with a replication manager

Hi RenΓ©, we have a similar use case as @ThomasLohner describes. For the sake of simplicity, let's say we have an HA solution backend by a master-master cluster setup (let's say with nodes A and B) which we would like to hide behind a proxysql instance and be able to :

  1. split read & writes
  2. as long as node A is available, then always route client writes to node A. Only if node A is not available, then failback writes to node B
  3. load-balance all reads to nodes A & B, except those that are part of an ongoing transaction that should be routed to the node where the transaction was initiated in the first place.

With the current implementation, we are not sure how we would be able to accomplish point 2. If within a hostgroup we could mark a host as backup, then I believe we could simply follow the suggestion in the MySQL read/write split with ProxySQL guide and use a regex to split traffic and route all writes to a hostgroup which would contain both nodes A & B, where B is marked as a backup host for that very hostgroup.

Thanks!
Filip

Hi Filip.
Please note my previous comment.
Why not using a replication manager (MHA, Orchestrator, MRM), and step away from master-master replication?

Hi @renecannao
We have more or less the same scenario and we cannot step away from master-master replication as we need to be able to write in 2 different DCs simultaneously.
Our services exist in both DC and traffic can be in any of them at any moment and cross-dc writes/reads is not acceptable due to latency.

I've set weight 100 on the local DC master and 1 on the second DC master. As expected some of the traffic is directed to the cross-dc master. As the above fellows already mentioned could be a nice feature to use the 0 weighted server in case we do not have any other non 0 weighted server. You can add a parameter to control the use of 0 weighted servers or NOT.

Regarding the above :

  1. it should not be considered as an HA solution

It's not I just want to prevent traffic to be sent cross-dc unless no other server is available.

  1. proxysql won't have a complete view of all the cluster, therefore a shunned node may not be really failed

In my case, I use orchestrator as the cluster discovery. But it doesn't really matter, if in a certain proxysql server there is only one server left in the hostgroup and it has 0 weight on it, I want it to be used (at least there should be a parameter to control this behavior) regardless of other proxysql servers and regardless if the server is "really" failed or not as proxysql is not sending traffic to it.

  1. this doesn't protect from network split

Sure it doesn't, and it's not the goal of that feature we would like to have.

  1. multiple proxies could end up with different writers

That's indeed a risk that should be taken into consideration :-)

  1. it is still questionable what to do with the previous writer : considering the lack of complete view of the cluster and is therefore unable to declare a node as failed without any reasonable doubt, maybe writes should be sent back to the previous writer

Not sure what do you mean here: if for any reason proxysql gets a DML operation and it doesn't have any server to write to , because all of the non 0 weight servers are 'SHUNNED','OFFLINE_SOFT' or 'OFFLINE_HARD' it should be able to write to the 0 weighted servers. Unless I do not understand the last point.

Hi, was this implemented ? We have a case where we do not want to use Master for read traffic at all unless all slaves are shunned. Can it be achieved by any other way?

Dear Renecannao, I also think this is an useful feature. As it does not affect other functions, why not take it in? Regards.

@renecannao ,

We have a similar situation as described in previous replies, where we have a Galera cluster split between three DCs and we want ProxySQL in DC X to use only the database servers in DC X if they are available, and only send cross-DC requests to DC Y & Z if the servers in DC X fail/get shunned.

Perhaps a possible solution -which is not ideal but it is something- would be to use the max weight of INT (2147483647) for same-DC database servers, and weight 1 for the other-DC database servers. Some cross-DC queries will still be made, but I'm guessing very few. Am I correct in assuming this?

Thank you for your input!

EDIT: it is worth to note that our Galera cluster is multi-master, and it is working great for us like this.
EDIT: tried it with weight 2147483647 in ProxySQL for same-DC database servers and it does not appear to have the intended effect, more queries than expected get sent cross-DC. Existing connections were stopped for this test, so it is not that.

Dear Rene ( @renecannao ), thank you so much for keeping Proxysql alive.

I'd like to input a small idea on this issue. Perhaps it's worth adding a feature to hostgroup configuration (Let's say use_heaviest_online_server_only) which would choose only 1 server from a hostgroup which:

  1. Has the most weight.
  2. status=ONLINE.

Here is my configuration right now:

  1. Right now I have 3 Percona MySQL servers in ACTIVE-ACTIVE-ACTIVE configuration (which means that any single one of them can accept a write and it will be replicated to other servers seamlessly).
  2. I have 2 hostgroups (one for writes and one for reads). All 3 servers participate on both hostgroups at the same time.
  3. Servers in the WRITE hostgroup have a weight assigned to them (the first one is 100000, while others are just 1). This allows 99.999% of writes to go a single server and helps avoid deadlocks.
  4. Servers in the READ hostgroup have a weight assigned to them according to their processor power and RAM (one of my servers is really new and super fast, so I give him most of the reads).
  5. When the primary WRITE server (the one with weight=100000) goes down, all the writes are spread evenly across the other 2 servers (which is what I and the issue creator want to avoid). Deadlocks start to happen, but in my case my software is prepared for this and does a automatic replay if it sees a deadlock error. Still not the best solution though.

Possible solution:

So, if the WRITE hostgroup had the use_heaviest_online_server_only option, I could assign different weights to participating servers (let's say 100, 50 and 10). And if the heaviest server (with weight=100) goes down, 100% of the writes would go only to the second server (the one with weight=50). Without sending a single write to the third server.

Thank you for your consideration.

Hi,
we too have a scenario where this is needed. Is there a status on this issue?

@renecannao can I put a $300 bounty on this fix? my primary server, which had the most weight, went down today and the two remaining servers started to struggle with the load balanced WRITEs, reporting lots of deadlocked queries as a result. Possible solution would be: #730 (comment)

Thank you!

I'm interested in this feature for reader hostgroups when no replica with > 1 weight is alive in the hostgroup. Instead, traffic would be sent to whatever nodes are available with a zero priority

I think this would be a safe feature for read-only traffic

We'd love to see this (weight=0 means don't use unless everything else is down) too - our use case is we have a pool redirecting select traffic over to replicas. In the even of a giant alter that lags the replicas, we'd like the queries to end up on the primary instead.

alnet commented

Chiming in with another +1 for the use case of wanting a 0 weight server to be standby and only used if the other hosts in the group are down. This feature is still very much wanted.

+1 it makes a lot of sense during read/write splits where all the writes go to a server A until it is online and falls back to another server ONLY when server A is no longer available

+1 We also have a Galera cluster, but looking to use the 2nd and 3rd node as backups if node 1 fails.

+1 also have a use-case where the ability to have a weight=0 set of hosts that are only accessed when all other nodes are shunned would be useful

Created a new issue linked to this: #4536 .