snowflakedb/snowflake-jdbc

SNOW-999335: Spark snowflake read results in certificate issue

Closed this issue · 13 comments

Please answer these questions before submitting your issue.
In order to accurately debug the issue this information is required. Thanks!

  1. What version of JDBC driver are you using?
    3.14.2

  2. What operating system and processor architecture are you using?
    amazon linux

  3. What version of Java are you using?
    11

  4. What did you do?
    Reading from snowflake with snowflake jdbc driver in Spark results in S3 ssl certificate issue. It seems like our Spark job is trying to access a snowflake customer staging bucket and is failing:

Job aborted due to stage failure: Task 0 in stage 3.0 failed 4 times, most recent failure: Lost task 0.3 in stage 3.0 (TID 1296) (10.0.24.229 executor 5): net.snowflake.client.jdbc.SnowflakeSQLLoggedException: JDBC driver internal error: Max retry reached for the download of #chunk0 (Total chunks: 17) retry=7, error=net.snowflake.client.jdbc.SnowflakeSQLLoggedException: JDBC driver encountered communication error. Message: Certificate for <sfc-va-ds1-customer-stage.s3.amazonaws.com> doesn't match any of the subject alternative names: [*.s3.amazonaws.com, s3.amazonaws.com].
    at net.snowflake.client.jdbc.RestRequest.execute(RestRequest.java:237)
    at net.snowflake.client.jdbc.DefaultResultStreamProvider.getResultChunk(DefaultResultStreamProvider.java:122)
    at net.snowflake.client.jdbc.DefaultResultStreamProvider.getInputStream(DefaultResultStreamProvider.java:39)
    at net.snowflake.client.jdbc.SnowflakeChunkDownloader$2.call(SnowflakeChunkDownloader.java:975)
    at net.snowflake.client.jdbc.SnowflakeChunkDownloader$2.call(SnowflakeChunkDownloader.java:889)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: javax.net.ssl.SSLPeerUnverifiedException: Certificate for <sfc-va-ds1-customer-stage.s3.amazonaws.com> doesn't match any of the subject alternative names: [*.s3.amazonaws.com, s3.amazonaws.com]
    at net.snowflake.client.jdbc.internal.apache.http.conn.ssl.SSLConnectionSocketFactory.verifyHostname(SSLConnectionSocketFactory.java:507)
    at net.snowflake.client.jdbc.internal.apache.http.conn.ssl.SSLConnectionSocketFactory.createLayeredSocket(SSLConnectionSocketFactory.java:437)
    at net.snowflake.client.jdbc.internal.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket(SSLConnectionSocketFactory.java:384)
    at net.snowflake.client.jdbc.internal.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:142)
    at net.snowflake.client.jdbc.internal.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:376)
    at net.snowflake.client.jdbc.internal.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:393)
    at net.snowflake.client.jdbc.internal.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236)
    at net.snowflake.client.jdbc.internal.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)
    at net.snowflake.client.jdbc.internal.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
    at net.snowflake.client.jdbc.internal.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
    at net.snowflake.client.jdbc.internal.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
    at net.snowflake.client.jdbc.internal.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
    at net.snowflake.client.jdbc.internal.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:108)
    at net.snowflake.client.jdbc.RestRequest.execute(RestRequest.java:222)
    ... 8 more
.
    at net.snowflake.client.jdbc.SnowflakeChunkDownloader.getNextChunkToConsume(SnowflakeChunkDownloader.java:601)
    at net.snowflake.client.core.SFArrowResultSet.fetchNextRowUnsorted(SFArrowResultSet.java:232)
    at net.snowflake.client.core.SFArrowResultSet.fetchNextRow(SFArrowResultSet.java:209)
    at net.snowflake.client.core.SFArrowResultSet.next(SFArrowResultSet.java:344)
    at net.snowflake.client.jdbc.SnowflakeResultSetV1.next(SnowflakeResultSetV1.java:92)
    at net.snowflake.spark.snowflake.io.ResultIterator.hasNext(SnowflakeResultSetRDD.scala:152)
    at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
    at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
    at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
    at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
    at org.apache.spark.sql.execution.columnar.DefaultCachedBatchSerializer$$anon$1.hasNext(InMemoryRelation.scala:118)
    at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
    at org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:223)
    at org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:302)
    at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1508)
    at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1435)
    at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1499)
    at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1322)
    at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:376)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:327)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
    at org.apache.spark.scheduler.Task.run(Task.scala:136)
    at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:829)

I reverted back to 3.12.12 and that helped fix the issue for me, but I wanted to flag this moving forward. LMK if this belongs better in the spark-snowflake project.

@dyang108 thanks for reporting this. Do you know if you have a proxy server in your environment and if you expect your S3 connection to go through it or it should bypass it?
The changes between 3.12.12 and 3.14.2 are quite a handful, but there has been significant changes in proxy configurations.

Can you open a support case and provide the Spark logs after adding the following JVM argument, please?
-Djavax.net.debug=ssl,handshake

You would need to add that to the Spark driver's extra Java options, for instance:
--conf spark.driver.extraJavaOptions='-Djavax.net.debug=ssl,handshake'

@dyang108 do you still need help with this?

I did a downgrade to 3.12.12 as a workaround. I think this remains an issue on 3.14.2.

I'm not sure if we have a proxy configured - if we do, this would be the first i've heard of it.

@dyang108 It's definitely interesting that your issue goes away after you downgrade to 3.12.12, but I'm not sure why that would be the case, and that's a pretty old version. Based on the error stack the problem was raised by the Apache HTTP client code when verifying the hostname, and that library version changed from v4.5.5 to v4.5.14 between those two JDBC driver versions. I don't see any difference in the implementation of that method in both of those branches:

Caused by: javax.net.ssl.SSLPeerUnverifiedException: Certificate for <sfc-va-ds1-customer-stage.s3.amazonaws.com> doesn't match any of the subject alternative names: [*.s3.amazonaws.com, s3.amazonaws.com]
    at net.snowflake.client.jdbc.internal.apache.http.conn.ssl.SSLConnectionSocketFactory.verifyHostname(SSLConnectionSocketFactory.java:507)
    at net.snowflake.client.jdbc.internal.apache.http.conn.ssl.SSLConnectionSocketFactory.createLayeredSocket(SSLConnectionSocketFactory.java:437)
    at net.snowflake.client.jdbc.internal.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket(SSLConnectionSocketFactory.java:384)
    at net.snowflake.client.jdbc.internal.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:142)
    at net.snowflake.client.jdbc.internal.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:376)

The challenge here is that this isn't something I'm able to reproduce, so it's hard to really say what might be going on here. While the JDBC driver version change has caused some sort of problem in your case, there's just no way for us to debug your problem without additional information.

A good place to start is to review the output produced by the following JVM argument when reproducing the issue:
-Djavax.net.debug=ssl,handshake

Will you be able to open a support case to share that information with us? Otherwise, I'm not entirely sure how else we can look into this.

I'm going to close this issue for now. If you're able to provide additional information to help us debug the issue then please feel free to open this again.

Reopening case since we received more information about the problem from a different user that experienced the same issue.

I've encountered this issue today with the Kafka Connector 2.2.1 and 2.1.2.
They use the JDBC version 3.24.5 and 3.13.30 respectively.
This might be an interaction with the JVM
I'm running my connect in a container using a red hat UBI OpenJDK image

> java -version
openjdk version "21.0.2" 2024-01-16 LTS
OpenJDK Runtime Environment (Red_Hat-21.0.2.0.13-1) (build 21.0.2+13-LTS)
OpenJDK 64-Bit Server VM (Red_Hat-21.0.2.0.13-1) (build 21.0.2+13-LTS, mixed mode)

Maybe this can help you track down the issue

@dyang108 @richard-axual we debugged this extensively with a user experiencing the same issue and based on our findings this had to do with the Apache HTTP client's usage of the public-suffix-list.txt file. PR #1690 addresses that issue and was included in version 3.15.1 which the user confirmed addressed their problem.
Can you try testing this with the JDBC driver version 3.15.1 and let us know if that addresses your problem or not?

@sfc-gh-wfateem Thanks for the update and explanation.
I've only encountered this issue with the Snowflake Kafka Connector, which embeds the JDBC driver.
So unfortunately I cannot verify this until the Kafka connector updates the dependency as well.

@richard-axual was your issue consistent or was it a one-time problem you experienced with the Snowflake Kafka Connector? The issue we're discussing here is a consistent problem once the JDBC driver version is upgraded.

@sfc-gh-wfateem The error with the updated version of the connector is consistent, but since it is on a different intermediate project I cannot guarantee it's the same problem

Thanks @richard-axual
We spent quite a bit of time trying to figure this out especially that it wasn't reproducible. The best solution we have which we believe should address the issue reported here is in #1690.
If you have a dev environment where you can consistently reproduce the issue, then I suggest trying to rebuild the Kafka connector with the newer JDBC driver version just to test if it actually addresses the issue or not.

I'm going to close this issue based on the assumption that #1690 addresses the issue, if not, please feel free to open this again.