OrleansContrib/Orleans.Clustering.Kubernetes

v7 fails with Microsoft.Orleans.Hosting.Kubernetes

rsirny opened this issue · 5 comments

When v7 is used with Microsoft.Orleans.Hosting.Kubernetes 7.0.0 it fails to interact with ClusterVersion CRD (see logs below). The hosting library is using KubernetesClient 7.0.7 while v7 clustering us using 6.0.19. The root cause is that KubernetesClient 7.0.7 library is used at runtime and it changes object returned by ListNamespacedCustomObjectAsync, GetNamespacedCustomObjectAsync, etc. from Newtonsoft.Json.Linq.JObject (assumed in 6.0.19) to System.Text.Json.JsonElement.

{"EventId":0,"LogLevel":"Warning","Category":"Orleans.Clustering.Kubernetes.KubeMembershipTable","Message":"We tried to Initialize ClusterVersion but fail. Ignoring for now...","Exception":"System.InvalidCastException: Unable to cast object of type \u0027System.Text.Json.JsonElement\u0027 to type \u0027Newtonsoft.Json.Linq.JObject\u0027.    at Orleans.Clustering.Kubernetes.KubeMembershipTable.TryInitClusterVersion()","State":{"Message":"We tried to Initialize ClusterVersion but fail. Ignoring for now...","{OriginalFormat}":"We tried to Initialize ClusterVersion but fail. Ignoring for now..."},"Scopes":[]}
{"EventId":0,"LogLevel":"Warning","Category":"Orleans.Clustering.Kubernetes.KubeMembershipTable","Message":"Failure reading all silo entries for cluster id actors-silo","Exception":"System.InvalidCastException: Unable to cast object of type \u0027System.Text.Json.JsonElement\u0027 to type \u0027Newtonsoft.Json.Linq.JObject\u0027.    at Orleans.Clustering.Kubernetes.KubeMembershipTable.GetClusterVersion()    at Orleans.Clustering.Kubernetes.KubeMembershipTable.ReadAll()","State":{"Message":"Failure reading all silo entries for cluster id actors-silo","ClusterId":"actors-silo","{OriginalFormat}":"Failure reading all silo entries for cluster id {ClusterId}"},"Scopes":[]}

I can try to fix it by upgrading KubernetesClient to 7.x.y version and switching from NewtonsoftJson to System.Text.Json. The officially supported Kubernetes version would go from 1.22 to 1.23.

Would that be ok with @galvesribeiro ?

cc @SamEmber

We are also interested in a fix for this for a v7 upgrade

#64

Fix here :)

I'm getting another exception now:

We tried to Initialize ClusterVersion but fail. Ignoring for now...
Operation returned an invalid status code 'UnprocessableEntity'
k8s.Autorest.HttpOperationException: Operation returned an invalid status code 'UnprocessableEntity'
   at async Task<HttpResponseMessage> k8s.Kubernetes.SendRequestRaw(string requestContent, HttpRequestMessage httpRequest, CancellationToken cancellationToken)
   at async Task<HttpOperationResponse<object>> k8s.AbstractKubernetes.k8s.ICustomObjectsOperations.CreateNamespacedCustomObjectWithHttpMessagesAsync(object body, string group, string version, string namespaceParameter, string plural, string dryRun, string fieldManager, Nullable<bool> pretty, IReadOnlyDictionary<string, IReadOnlyList<string>> customHeaders, CancellationToken cancellationToken)
   at async Task<object> k8s.CustomObjectsOperationsExtensions.CreateNamespacedCustomObjectAsync(ICustomObjectsOperations operations, object body, string group, string version, string namespaceParameter, string plural, string dryRun, string fieldManager, Nullable<bool> pretty, CancellationToken cancellationToken)
   at async Task Orleans.Clustering.Kubernetes.KubeMembershipTable.TryInitClusterVersion()

I think (based on the log messages) it is the second k8s client call in TryInitClusterVersion which fails. I added another PR to log the server response from k8s.

#65

I think this was caused by the upgrade on the k8s client, the k8s API and STJ. So I would try it on a clean installation.

A new version was published and should be available shortly https://www.nuget.org/packages/Orleans.Clustering.Kubernetes/7.1.1