How to approach multi codepage datasets?

Question

How to approach multi codepage datasets?

BenceBenedek opened this issue 2 years ago · 3 comments

Background [Optional]

Hi, I'm currently working on a use case where we have:

-a fairly complex copybook (some 1700 lines)
-several record types (variable length)
-several fields which contains free text
-several different codepages were used (based on country code)

One example would be this record:

   10  FILLER                    REDEFINES   ...-DATAPART.

*** ========= ... ... TEXT==========================
15 ...-REC....
20 ...-...-TIMESTAMP PIC X(26).
20 ...-...-CLTX-TEXT PIC X(4026).

I may overlook something, but in order to get a readable data for all countries, I need to parse the cobol file for every codepage (i have to define the code page during the cobrix configuration) which is used, then filter the data based on country code, write out the df and finally, merge all the df-s.

Ideally, only the specific fields should be decoded with specific codepages, and this should be done by one parse action.

Question

Is there a way to apply business logic and based on that, use the correct code page during the parsing?

Many thanks for your help.

Answer 1 · 2023-01-25T11:13:41.000Z

This is a very good question. This is not supported at the moment, but shouldn't be very hard to add.

Answer 2 · 2023-02-15T07:49:31.000Z

This is how it is supported in the current master, and will be in 2.6.4:

        .option("field_code_page:cp037", "FIELD-1,FIELD_2")
        .option("field_code_page:cp870", " FIELD-3 ")

You can specify a code page, and the list of fields that have that encoding.

Answer 3 · 2023-02-21T07:08:18.000Z

This is how it is supported in the current master, and will be in 2.6.4:
        .option("field_code_page:cp037", "FIELD-1,FIELD_2")
        .option("field_code_page:cp870", " FIELD-3 ")
You can specify a code page, and the list of fields that have that encoding.

Thank you @yruslan will test it out.