OrleansContrib/Orleans.Providers.MongoDB

Could not find any gateway in global::Orleans.Providers.MongoDB.Membership.MongoGatewayListProvide

darthkurak opened this issue · 8 comments

Hi.
When i have only one silo entry in Membership Collection, Orleans Client (web api) is not able to connect to it. Im getting error:
Could not find any gateway in global::Orleans.Providers.MongoDB.Membership.MongoGatewayListProvider

Steps to reproduce:
So, im launching Silo, entry is created:

{
"_id" : "Something@10.0.75.1:11111/298645963",
"etag" : "2bf5b79b-457a-4f42-90d6-a7563f755710",
"deploymentId" : "SomeDeployment",
"hostName" : "JDROPIANB",
"siloAddress" : "10.0.75.1:11111@298645963",
"siloName" : "Silo_5baa7",
"roleName" : "SomeSilo",
"statusText" : "Active",
"iAmAliveTime" : "2019-06-19 13:13:01.539 GMT",
"startTime" : "2019-06-19 13:12:43.690 GMT",
"proxyPort" : 30000,
"updateZone" : 0,
"faultZone" : 0,
"status" : 3,
"suspectTimes" : [],
"timestamp" : ISODate("2019-06-19T13:13:01.700Z")
}, 

then launching Orleans Client, im getting error. Shutting down everything. Start silo again. And second entry is created:

{
"_id" : "Something@10.0.75.1:11111/298646657",
"Etag" : "d3ae87ac-bcc3-4cdc-97c3-f07702f94884",
"DeploymentId" : "SomeDeployment",
"HostName" : "JDROPIANB",
"SiloAddress" : "10.0.75.1:11111@298646657",
"SiloName" : "Silo_2c43b",
"RoleName" : "SomeSilo",
"StatusText" : "Active",
"IAmAliveTime" : "2019-06-19 13:24:35.206 GMT",
"StartTime" : "2019-06-19 13:24:17.778 GMT",
"ProxyPort" : 30000,
"UpdateZone" : 0,
"FaultZone" : 0,
"Status" : 3,
"SuspectTimes" : [],
"Timestamp" : ISODate("2019-06-19T13:24:35.378Z")
}

launching Orleans Client, and eveyrthing is ok - connection is established.
It seems like new bug which was introduced in new version? (i believe that i don't had this issue before)

Are there any other messages, eg indicating that there was a failed connection attempt to that gateway? Could you also show me how the ClusterId & ServiceId are being configured? - Let's verify that the values are the same on the client + silo. Do you have logs from client + silo that you can share (possibly redacted)?

Off topic: Would be great, if you could format your issue next time. Was just a wall of text.

Hi. Sorry, for bad format. Will do better next time.
On Client Side im getting only this:

Orleans.Runtime.SiloUnavailableException: Could not find any gateway in global::Orleans.Providers.MongoDB.Membership.MongoGatewayListProvider. Orleans client cannot initialize.
   at Orleans.OutsideRuntimeClient.<>c__DisplayClass56_0.<<StartInternal>b__0>d.MoveNext() in D:\build\agent\_work\7\s\src\Orleans.Core\Runtime\OutsideRuntimeClient.cs:line 218
--- End of stack trace from previous location where exception was thrown ---
   at Orleans.OutsideRuntimeClient.<StartInternal>g__ExecuteWithRetries|56_3(Func`1 task, Func`2 shouldRetry) in D:\build\agent\_work\7\s\src\Orleans.Core\Runtime\OutsideRuntimeClient.cs:line 277

Silo Configuration:

builder.Configure<ClusterOptions>(options =>
                {
                    options.ClusterId = "OpenIoT";
                    options.ServiceId = "OpenIoTService";
                })
                .UseMongoDBClustering(opt =>
                {
                    opt.ConnectionString = "someConnectionString"
                    opt.DatabaseName = "backend";
                    opt.CollectionPrefix = "iot";
                })
                .UseDashboard()
                .ConfigureEndpoints(11111, 30000)
                .ConfigureApplicationParts(parts =>
                    parts.AddApplicationPart(typeof(DeviceTelemetryWriterGrain).Assembly).WithReferences())
                .ConfigureLogging(logging =>
                    logging.AddConfiguration(Configuration.GetSection("Logging")).AddConsole().AddFile("logs.txt"));

            var host = builder.Build();

Client Configuration:

var client = new ClientBuilder()
                .Configure<ClusterOptions>(options =>
                {
                    options.ClusterId = "OpenIoT";
                    options.ServiceId = "OpenIoTService";
                })
                .UseMongoDBClustering(opt =>
                {
                    opt.ConnectionString = "someConnectionString"
                    opt.DatabaseName = "backend";
                    opt.CollectionPrefix = "iot";
                })
                .ConfigureApplicationParts(parts =>
                    parts.AddApplicationPart(typeof(IDeviceMessageDispatcherGrain).Assembly))
                .Build();

That work, but only when i have at least two entries in membership collection. (I have two run Silo two times, for Client be able to connect)

Your configuration looks correct to me.

When you say "shutting down everything", are you doing that manually or is it shutting itself down?

It seems like a bug in the mongodb provider, but I'm not sure. Is anyone else able to repro?

Myself. Locally - by stop debug, on linux- Just systemctl stop of silo and web api process.
What is more interesting.
I updated packages to 2.3.5, and it seems that this doesn't work even if i have more than one row.
Difference which I see, is that in 2.3.5 there is auto-cleanup of Membership Table, so i don't have two rows with Active status in the same time. It looks like, MongoDbProvider needs two rows with active status.

Are you able to run the client under a debugger and step into OutsideRuntimeClient & so on, all the way through to MongoGatewayListProvider.GetGateways? I wonder what's going on internally...

It seems that i find out a problem.
We are using MongoDb for our Domain Model.
We have separate MongoDbContext which have its own BsonSerializer settings, including CamelCaseElementNameConvention.
That interfere with what is set by MongoDbProvider for Orleans. As MongoDb provider has only one global place for settings, our settings and convention overrides that from MongoDbProvider.
Everything works ok, until, from some reason, Orleans client starts read Membership Table before our MongoDbContext (which was set as singleton in IoC) was created, and settings applied. Those lead to mismatch of how Silo serialize and write entries to Membership Table, and how Client reads them (write was with camelCase, but read without it).
However i don't know why creating second entry helped before. (on packages without auto-clean table).
Another thing is that we don't get any info about wrong deserialization. It just silently fails and give error described above (no gateway).
Maybe it would be worth to check exception handling and include some info about wrong deserialization?
Anyway, thanks for help!