apereo/dotnet-cas-client

Null Exception on BaseGetAllKeys

mbourletsg opened this issue · 7 comments

Hello,

We had a problem occuring several times randomly.
Our server is unable to come back on its own afterwards. We had to recycle on the few times it occured.

Do you have any clue for the cause ? See below for stacktrace
Thanks for your help

Exception message
Object reference not set to an instance of an object.
Stacktrace
at System.Collections.Specialized.NameObjectCollectionBase.BaseGetAllKeys()
at System.Collections.Specialized.NameValueCollection.get_AllKeys()
at DotNetCasClient.Utils.UrlUtil.ConstructValidateUrl(String serviceTicket, Boolean gateway, Boolean renew, NameValueCollection customParameters)
at DotNetCasClient.Validation.TicketValidator.AbstractUrlTicketValidator.Validate(String ticket)
at DotNetCasClient.CasAuthentication.ProcessTicketValidation()
at DotNetCasClient.CasAuthenticationModule.OnAuthenticateRequest(Object sender, EventArgs e)
at System.Web.HttpApplication.SyncEventExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute()
at System.Web.HttpApplication.ExecuteStepImpl(IExecutionStep step)
at System.Web.HttpApplication.ExecuteStep(IExecutionStep step, Boolean& completedSynchronously)
End exception

Uh.... not that it's your fault, but that stack trace is pretty useless, lol.

I'm going to need more context and also reproduction scenarios (full details on what you did to reproduce the problem forcefully.) E.g. do you have more environments than just production? Is this problem exhibited in all environments or just one? In a non-production environment where this problem occurs can you make it so that you are the only person using it, then freshly spin up the app, and does this happen on the first attempt at log on for the sole user of the system OR does it happen seemingly randomly or randomly after a certain amount of time?

What operating system are you running this on (and flavor.) What version of the .NET framework does your project target? Is it compiled for x86, x64 or AnyCPU? What project type is it: e.g. ASP.NET WebForms, Web Pages, MVC (which specific version of the one are you using?) What version of our NuGet package are you using and are you using the most recent one? If you are not using the NuGet package and are instead directly referencing DLL's you should probably not be doing that. Does this error occur at what seems like random? Does it only occur for a specific user? Which version of CAS server are you targeting? Which type and version of authentication ticket are you using (e.g. Cas, Saml, etc?) What versions of TLS are you using on both your Windows/IIS server and also the server that CAS is running on?

I'm assuming you did your sleuthing as a developer and went over all of this already and have completely exhausted all efforts, but I wasn't there with you so I'll need the infos ;)

Can you also share your casClientConfig section in your web.config file? For the 3 entries that contain your server names in the URL's, can you just replace just the hostnames with "example.com"... leave the rest intact.

If you are using the latest version of our NuGet package... have you tried rolling back to each previous version till you reached one that does work? If so, what was the last working version?

Alright, lot's of question, I'll try to answer each of them precisely :)
By the way, thanks for the help !

I don't have any clue what is creating this problem actually. It went down during this night. This particular app we're talking is an internal web application with around 150 users at least.

Keep in mind that the system is working flawlessly in production 24/6 with ~150 concurrent users logged through SSO so we are not talking of some trivial bug of misconfiguration.

We have production environment and preproduction environment on the same server (about to be changed in coming months to separate both, you may have a clue here because the preproduction stood started last night where we shut it down most of the time when not needed). It is on another AppPool and another database
It happened on production environment. I have no clue how to reproduce it on preproduction. I don't even know if the CAS server we're using can be a cause or not for this problem (the CAS server might have been down and causing some null exception on our side (??)) ; we are a small team in a big corporation so contacting people managing the CAS server status could be tricky ... but I could try.

It seems that once the problem occurs once, it occurs for everybody. Only way to solve it is to recycle (and clean this dictionary from RAM I guess)

We are using Windows Server 2012 ; x64
Project is targetting .Net 4.5.2 ; Any CPU ; it is a .net MVC project v5.2.6.0

dotnetcasclient DLL is taken from NUGET : v1.1.0.0 ; I forgot to check if we were on latest release ; My bad, we will use latest on Monday (Because problem seems randomly happening, do you have any clue of something like that being solved in latest releases ?)

We are using CAS20 ; CAS Server through https ; our server in http at the moment
Here are the interesting part of our webconfig

<forms name=".DotNetCasClientAuth" loginUrl="https://example.com/cas/login" cookieless="UseCookies" defaultUrl="/" path="/" />

<casClientConfig casServerLoginUrl="https://example.com/cas/login/"
                   casServerUrlPrefix="https://example.com/cas/"
                   serverName="http://example2.com"
                   redirectAfterValidation="true"
                   renew="false"
                   singleSignOut="true"
                   ticketValidatorName="Cas20"
                   proxyTicketManager="CacheProxyTicketManager"
                   serviceTicketManager="CacheServiceTicketManager"
                   gatewayStatusCookieName="CasGatewayStatus" />

Hello,
In the meantime, we had a discussion with the CAS server team. They updated their system and we discovered some problem concerning the proxy, it broke the entire system once they corrected it. On our side, we had to remove this line "proxyTicketManager="CacheProxyTicketManager" and everything came back.
The first problem might have been caused by this proxy mode not working as intended by CAS standard on the CAS server.

If I understand you correctly you are saying you think the proxy mode on the CAS server was configured wrong? FWIW, if you specify the proxyTicketManager attribute and the proxy stuff isn't configured (at all or correctly) on the CAS server then things will probably go wrong in the ASP.NET application. There is the chance they may never had it configured on the CAS server.

Yes that's right. We don't know exactly the inner problems but from information we have :

  • Proxy was not working correctly on the CAS Server
  • The effect of this is the system worked well with this misconfiguration when it should have not (but might have some weird problems like the one described in my previous messages, redirection loops etc.)
  • They updated it to correct the problem on the proxy
  • Our system was not working anymore forcing us to correct our configuration.

I guess we can close this issue, and hope not to reopen it :)

Well if you run into further problems just reopen or make a new issue. If the other team has CAS related errors there is a Gitter channel that may be of help to them where they can ask questions: https://gitter.im/apereo/cas

Apereo CAS - Single Sign On for All