momentohq/client-sdk-rust

SDK is making connections to both cache. and control. for GETs/SETs

Closed this issue · 5 comments

I've discovered that regardless of whether it's a control-plane or data-plane request, we make connections to both cache. and control. endpoints.

The implication of this is that if our control-plane is down our SDK cannot make a call to the data-plane, and vice-versa.
Update: I definitely saw that ControlPlane was unavailable when there were 0 MR2 nodes. It's possible that the converse is not true, since we're using ALB/NLB in control/data plane. Needs to verify.

Slack convo: https://gomomento.slack.com/archives/C01AC7JT370/p1644271358595819?thread_ts=1644269711.139799&cid=C01AC7JT370

Behavior should be checked in all SDKs.

Updating this to keep track of progress for me to keep coming back to this:

@bruuuuuuuce - was able to generate a few tokens with one valid and one invalid endpoints.

A couple quick tests in Java SDK seem to be working as intended

  • Able to call data plane successfully when control endpoint is bad
  • Able to call control plane successfully when data endpoint is bad

CLI seems to be struggling as noted. I will do the same steps next for other SDKs.

Verified Python, .NET and Javascript SDKs and they seem to be handling this okay.

Follow up items here:

  • Verify Rust SDK
  • Fix CLI tool
  • Add tests to all SDKs to make sure we never break this behavior - one backend going down should have no impact on another. Customers should still be able to create clients and perform operations against the available backends.

There definitely seems like something is wrong with the Rust SDK

I wrote a quick test using the tokens I have been using for other SDK tests and it seems to fail at client initialization - specifically at https://github.com/momentohq/client-sdk-rust/blob/main/src/simple_cache_client.rs#L90

    #[tokio::test]
    async fn invalidControlTokenCanStillInitializeSdk() {
        let token= String::from("<Bad Token>") ;
        let mut client = SimpleCacheClient::new(token, 5).await.unwrap();
        let result = client.set("cache", "hello", "world", None).await.unwrap_err();
        let _err_msg = "Cache name cannot be empty".to_string();
        assert!(matches!(
            result,
            MomentoError::NotFound(_err_msg)
        ))
    }

So this seems like something going on with the Rust/Tonic itself and not really a direct bug with our implementation. There may be another way of client initialization that I need to look into. However the examples in the repo exhibit same behavior - a client cannot be initialized if the server is down. One thing that I still need to verify is the behavior if the server becomes unavailable after the client setup.

Rust behavior is very different from the basic examples that I verified in this repo which is basically all the Google supported languages. I have verified that clients can be initialized with Java and Python when server is down and they are also able to recover if server becomes unavailable. I haven't tried all the languages exhaustively but I would be surprised if the behavior is inconsistent. We have also seen evidence of this working correctly in our SDKs.

It seems like it is something basic that should work out of the box for Rust but then I don't see anyone else complaining about this behavior which makes me think if I am missing something.