appmattus/certificatetransparency

Unable to load log servers with log-list.json

HappyDr0id opened this issue ยท 8 comments

Hey ๐Ÿ‘‹

We did the migration from 0.3.0 to 1.1.1 recently and we observed some non-fatal reports on Crashlytics since then. By digging into the issue, it looks like there is an internal library error happening quite randomly, only from time to time (like once every few minutes), without preventing the usage of our app overall (at least visibly for the user). Here is the stack trace:

2022-10-04 17:33:08.402 30118-30253/<package_name> I/CertificateTransparency: <domain_url> Failure: Unable to load log servers with log-list.json failed to load with java.lang.InterruptedException
        at kotlinx.coroutines.BlockingCoroutine.joinBlocking(Builders.kt:84)
        at kotlinx.coroutines.BuildersKt__BuildersKt.runBlocking(Builders.kt:59)
        at kotlinx.coroutines.BuildersKt.runBlocking(Unknown Source:1)
        at kotlinx.coroutines.BuildersKt__BuildersKt.runBlocking$default(Builders.kt:38)
        at kotlinx.coroutines.BuildersKt.runBlocking$default(Unknown Source:1)
        at com.appmattus.certificatetransparency.internal.verifier.CertificateTransparencyBase.hasValidSignedCertificateTimestamp(CertificateTransparencyBase.kt:112)
        at com.appmattus.certificatetransparency.internal.verifier.CertificateTransparencyBase.verifyCertificateTransparency(CertificateTransparencyBase.kt:96)
        at com.appmattus.certificatetransparency.internal.verifier.CertificateTransparencyInterceptor.intercept(CertificateTransparencyInterceptor.kt:69)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
        at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.kt:34)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
        at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.kt:95)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
        at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.kt:83)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
        at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.kt:76)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
        <...>
2022-10-04 17:33:08.402 30118-30253/<package_name> I/CertificateTransparency:     at java.lang.Thread.run(Thread.java:1012)
2022-10-04 17:33:08.404 30118-30253/<package_name> I/okhttp.OkHttpClient: <-- HTTP FAILED: javax.net.ssl.SSLPeerUnverifiedException: Certificate transparency failed

We didn't have this report at all with the previous 0.3.0 version, and this failure is happening really randomly, among successful "SCT trusted logs". Our domains have needed certificates and Certificate Transparency is working as expected, so, especially seeing the json file reading failure, it looks to be more internal to the library than due to a certificate failure itself.

Does it ring a bell to you? Thanks in advance for your help, and overall for your work on this CT library ๐Ÿ™

Nothing in particular comes to mind - InterruptedException kind of implies the the coroutine block has been cancelled in some way - could be timeout? I might need to add some more tests around timeouts etc to see if I can replicate more reliably.
The fix may be as simple as the code detecting this exception and suppressing (if appropriate)

One thing to look at is what you do in terms of caching the log list data (to disk for example) that could reduce the number of network hits

On reflection I think this is likely solved by #39 and given no other reports of a similar issue I'm going to close this one.

Thanks for your answers and your work on this library ๐Ÿ™ I'll see with the latest release if this is something still being reported in production environment or not ๐Ÿ™‚

ikeed commented

The fix may be as simple as the code detecting this exception and suppressing (if appropriate)

Unfortunately in the case of the InterruptedException, the lib is throwing an SSLPeerUnverifiedException which is indistinguishable from a validation failure. In our app, I implemented a kind of kludgy workaround to look at the VerificationResult from the last logger callback to determine whether we should suppress the error. It would be great if there were an easier way to distinguish these cases.

I can still reproduce this issue on 2.1.2 by spamming a bunch of requests at once.

Have reopened for further investigation. @ikeed, this definitely sounds testable/reproducible by what you are saying then.

ikeed commented

Thanks very much Matt! If you need any more info from me, just ask.

ikeed commented

Incidentally, if the InterruptedException is too complex to fix, just embedding some kind of a "reason" field into the SSLPeerUnverifiedException would suit my needs. I'm not sure if that's the direction you want to go. We get a VerificationResult in the log callback and it would be useful to have something similar in the SSLPeerUnverifiedException itself so clients can make smarter decisions when an error is thrown. Just to help distinguish between concurrency issues and an actual dodgy cert.