3scale/apisonator

Evaluate Envoy as an alternative to Twemproxy

davidor opened this issue · 6 comments

Twemproxy is no longer maintained and Envoy can act as a Redis proxy: https://www.envoyproxy.io/docs/envoy/v1.13.0/api-v2/config/filter/network/redis_proxy/v2/redis_proxy.proto

I tried to pass the test container by simply stopping Twemproxy and starting an Envoy with an equivalent config:

static_resources:
  listeners:
  - name: redis_listener
    address:
      socket_address:
        address: 0.0.0.0
        port_value: 22121
    filter_chains:
    - filters:
      - name: envoy.redis_proxy
        typed_config:
          "@type": type.googleapis.com/envoy.config.filter.network.redis_proxy.v2.RedisProxy
          stat_prefix: egress_redis
          settings:
            op_timeout: 5s
            enable_hashtagging: true
          prefix_routes:
            catch_all_route:
              cluster: redis_cluster
  clusters:
  - name: redis_cluster
    connect_timeout: 1s
    type: strict_dns # static
    lb_policy: RING_HASH
    load_assignment:
      cluster_name: redis_cluster
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: 127.0.0.1
                port_value: 7379
        - endpoint:
            address:
              socket_address:
                address: 127.0.0.1
                port_value: 7380
admin:
  access_log_path: "/dev/null"
  address:
    socket_address:
      address: 0.0.0.0
      port_value: 8001

All the tests pass except some in the BucketStorage and BucketReader classes. They fail because they use the SUNION command, which is not supported in Envoy. We could replace those SUNIONS with SMEMBERS and perform the union in Ruby. It would probably be less efficient.

It is very promising for replacing twemproxy when deploying multiple Redis instances. The very good thing I see is it appears to be actively maintained and aiming to support Redis Cluster.

Problems:

  • Support for SUNION. Is there a fundamental reason it is not implemented? Doing it in the app is bad.
  • Ketama key distribution. IIRC Twemproxy used the same scheme, but this needs verification and tests in an existing deployment (like 3scale's SaaS).
  • Any other unsupported commands?

Other than that it is ok with me, will need @3scale/operations to look at it for our deployment, but as a general Redis scalability solution it would be great.

Answers inline:

Problems:

* Support for `SUNION`. Is there a fundamental reason it is not implemented? Doing it in the app is `bad`.

To use SUNION all the keys passed to the command need to go to the same shard, otherwise, it returns incomplete results. Twemproxy decided to implement this and add a warning: https://github.com/bytedance/twemproxy/blob/master/notes/redis.md#sets (see the note below the table). Envoy on the other hand, decided that it was dangerous and not worth to add support for it envoyproxy/envoy#8887
We making sure that all the params of SUNION go to the same shard:

# We want all the buckets to go to the same Redis shard.

That command is only used to export analytics, a feature that I suspect is not used by anyone anymore, so I wouldn't say that this is bad :)

* `Ketama` key distribution. IIRC Twemproxy used the same scheme, but this needs verification and tests in an existing deployment (like 3scale's SaaS).

Yes, this is the most important point. We need to be 100% sure that all the keys are sharded to the same servers. It can be configured to use Ketama as Twemproxy, but I'm not sure whether the hashing of keys is the same.

* Any other unsupported commands?

https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/other_protocols/redis#supported-commands

I just realized that we are using "fnv1a_64" as the hash function in Twemproxy, and Envoy does not use that one : https://www.envoyproxy.io/docs/envoy/latest/api-v2/api/v2/cluster.proto#enum-cluster-ringhashlbconfig-hashfunction

So, while starting a new installation with Envoy looks possible with some work, migrating an existing one could be challenging.

@davidor we can repurpose this issue to test Envoy in a new deployment - and in particular to add support for Envoy as a proxy at the app level (ie. working around SUNION under a configurable option?), and if ok then proceed to mention it in documentation about how to scale the database.

In the longer term perhaps that hash function ends up implemented, or we migrate our existing deployment to a different hashing function (which would make for an interesting project).

We would need to also review the redis commands used directly by System for fetching Analytics, which I believe includes a sort command that also needs all the keys to be in the same shard - so I would imaging Envoy make the samem decision there and decided to not support it?

We could consider making an upstream contribution to Envoy to make allowing those commands to be configurable, and used "at your own risk" (meaning if all keys fall in the same shard then it will work, otherwise it will fail).

The relevant code in the envoy redis filter where the supported redis commands are listed is here: https://github.com/envoyproxy/envoy/blob/7136c3ade0a8366a86621a1a3a63993af5573486/source/extensions/filters/network/common/redis/supported_commands.h#L17-L88

I believe that supporting those commands it's just a matter of adding them there as "simple commands". In fact they are not simple at all and should be responsibility of the user to use them as such (meaning using them only when all the involved keys belong to the same shard). Of course then there is the matters of hiding this commands under a configuration field in the filter and the maintainers accepting this solution :).