AbsaOSS/cobrix

Add the ability to implement custom field value to record length mapper

Closed this issue · 7 comments

Background

When a file has a record length field, but neither mechanisms work:

  • Fixed length does not work because record lengths are variable for the file.
  • Record field expression is not rich enough to express the relation between field values and record lengths.
  • Record length mapping is not sufficient because the mapping is too complex

and

  • Custom record extractor is too hard to implement

then a custom record length mapper can work.

You can use custom logic to map values from the specified field to the record length value.

Feature

Add the ability to implement custom field value to record length mapper.

Example [Optional]

.option("record_length_field", "my_length_field")
.option("record_length_custom_mapper", "com.example.MyMapper")

Proposed Solution [Optional]

To be added...

Also, currently there is check on record_length_field to be Integral, for copybooks that defines the field as PIC S9(04) COMP. this will fail the validation, since it's mapped to BigDecimal?

Hi @af6140, PIC S9(04) COMP. usually maps as a integer which is a valid integral type. What are all options you pass to spark-cobol?

@yruslan , my record_length_field is parsed as : Integral(S9(4),4,Some(Left),false,None,Some(COMP-4),Some(EBCDIC),Some(S9(04))) ( parameter of lengthAST in following code).

But in the following code, it does not match Int or Long,

final private def getRecordLengthFromField(lengthAST: Primitive, binaryDataStart: Array[Byte]): Int = {
    val length = if (isLengthMapEmpty) {
      ctx.copybook.extractPrimitiveField(lengthAST, binaryDataStart, readerProperties.startOffset) match {
        case i: Int    => i
        case l: Long   => l.toInt
        case s: String => s.toInt
        case null      => throw new IllegalStateException(s"Null encountered as a record length field (offset: $byteIndex, raw value: ${getBytesAsHexString(binaryDataStart)}).")
        case _         => throw new IllegalStateException(s"Record length value of the field ${lengthAST.name} must be an integral type.")
      }
    } else {
      ctx.copybook.extractPrimitiveField(lengthAST, binaryDataStart, readerProperties.startOffset) match {
        case i: Int    => getRecordLengthFromMapping(i.toString)
        case l: Long   => getRecordLengthFromMapping(l.toString)
        case s: String => getRecordLengthFromMapping(s)
        case null      => defaultRecordLength.getOrElse(throw new IllegalStateException(s"Null encountered as a record length field (offset: $byteIndex, raw value: ${getBytesAsHexString(binaryDataStart)})."))
        case _         => throw new IllegalStateException(s"Record length value of the field ${lengthAST.name} must be an integral type.")
      }
    }
    length + recordLengthAdjustment
  }

Looks like after set strict_integral_precision to false, it works.

Looks like after set strict_integral_precision to false, it works.

Ah, good spot! Will create a bug report to fix this. It should work with strict_integral_precision=true.

Will be fixed soon: #763

Thanks for reporting!

Fixed in #763
It is merged to master.