Add support for S3 storage
awsazuser opened this issue · 4 comments
Does cobrix support S3 file systems ?
I am getting "java.lang.IllegalArgumentException: Wrong FS" error when loading the copybook and datafile from a AWS S3 bucket.
Code:
val spark = SparkSession.builder().appName("Spark-Cobol").getOrCreate()
import spark.implicits._
import za.co.absa.cobrix.spark.cobol.source
val df = spark.read.format(
"za.co.absa.cobrix.spark.cobol.source").option(
"copybooks", "s3://xxxx/tesfile.cbl").load("s3://xxxx/sourcedata/DATAFILE0100")
df.printSchema
df.show()
Error:
java.lang.IllegalArgumentException: Wrong FS: s3://xxxx/tesfile.cbl, expected: hdfs://ip-xxx-xx-xx-85.ec2.internal:8020
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:653)
at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:194)
at org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:106)
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1305)
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1317)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1430)
at za.co.absa.cobrix.spark.cobol.source.parameters.CobolParametersValidator$.za$co$absa$cobrix$spark$cobol$source$parameters$CobolParametersValidator$$validatePath$1(CobolParametersValidator.scala:71)
at za.co.absa.cobrix.spark.cobol.source.parameters.CobolParametersValidator$$anonfun$validateOrThrow$2.apply(CobolParametersValidator.scala:94)
at za.co.absa.cobrix.spark.cobol.source.parameters.CobolParametersValidator$$anonfun$validateOrThrow$2.apply(CobolParametersValidator.scala:93)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at za.co.absa.cobrix.spark.cobol.source.parameters.CobolParametersValidator$.validateOrThrow(CobolParametersValidator.scala:93)
at za.co.absa.cobrix.spark.cobol.source.DefaultSource.createRelation(DefaultSource.scala:52)
at za.co.absa.cobrix.spark.cobol.source.DefaultSource.createRelation(DefaultSource.scala:48)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:307)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:156)
... 160 elided
Unfortunately, S3 is not supported right now. But we might add S3 support in the future.
S3 storage should be supported in spark-cobol
version 2.2.0
.
Please, let me know if it works for you.
Does cobrix supports gs:// file system ?
i'm getting the same error as
Caused by: java.lang.IllegalArgumentException: Wrong FS: gs://
From the filesystem support perspective, spark-cobol
is the same as any other Spark data source. If you can use gs://
to read CSV or Parquet, then it should be possible to read mainframe files as well.