PIC S9(10)V USAGE COMP-3 is converted to long instead of Decimal(10,0)
pinakigit opened this issue · 7 comments
I have a copybook where the field is defied as PIC S9(10)V USAGE COMP-3. When we read the file in cobrix this field is being created as Long instead of Decimal(10,0). Is it expected this way because its spark property or can it be created as Decimal(10,0)
Yes, Cobrix prefers short, int and long for numbers that fit these types and doesn't have scale. You can convert them to decimal using cast: .withColumn("my_field", cast(col("my_field"), "decimal(10,0)").
Sure. We can do that but we will have to maintain a mapping for each file for which fields to convert and hence was asking.
One way to automate this is to use extended metadata. It adds more info from the copybook for each field.
.option("metadata", "extended")You can view which metadata fields are available like this:
df.schema.fields.foreach{ field =>
val metadata = field.metadata
println(s"Field: ${field.name}, metadata: ${metadata.json} ")
}
// Returns: {"originalName":"MY_FIELD","precision":4,"usage":"COMP-3","signed":true,"offset":17,"byte_size":2,"sign_separate":false,"pic":"S9(4)","level":2} The precision is the one we need.
You can apply the casting logic something like this:
val columns = df.schema.fields.map { field =>
val metadata = field.metadata
if (metadata.contains("precision")) {
val precision = metadata.getLong("precision").toInt
if (field.dataType == LongType || field.dataType == IntegerType || field.dataType == ShortType) {
println(s"Cast ${field.name} to decimal($precision,0)")
col(field.name).cast(DecimalType(precision, 0)).as(field.name)
} else {
col(field.name)
}
} else {
col(field.name)
}
}
val dfNew = df.select(columns: _*)
dfNew.printSchemaAn experimental method SparkUtils.covertIntegralToDecimal() has been added in 2.7.1. You can use it to post-process dataframe returned by Cobrix to map the schema the way you want:
val df = sparkread.format("cobol")
.option("metadata", "extended")
.load("/path/to/file")
val df2 = SparkUtils.covertIntegralToDecimal(df)This is something we are going to implement as a new feature.
Something like:
.option("strict_integral_precision", "true")(not final)
The changes are available in the master branch. @pinakigit, please test if you can.
The behavior is turned on when using this option:
.option("strict_integral_precision", "true")
The feature is available in 2.7.3: https://github.com/AbsaOSS/cobrix?tab=readme-ov-file#273-released-17-jule-2024