Bytelength

Question

Bytelength

Loganhex2021 opened this issue 3 years ago · 5 comments

Loganhex2021 commented 3 years ago

Background [Optional]

We are using cobrix library for reading ebcdic file in the Databricks. There is a validation requirement to check record byte length for each record in the file.

Question

Is there any option to generate byte length for the record while reading ebcdic file?

Answer 1 · 2021-09-21T06:25:12.000Z

@yruslan - Could you please let me know if you have any idea to calculate byte length for a reach in ebcdic file ?

Answer 2 · 2021-09-21T12:57:23.000Z

Do you need a record size for each record or file size for each record?

You can get a file name for each record using either

.option("with_input_file_name_col", "input_file_name")

or

df.withColumn("input_file_name", input_file_name())

depending on the type of file (variable length vs fixed length)
You can then use a filesystem API (Hadoop Client, etc) to get the file size for each file.

Answer 3 · 2021-09-21T13:01:01.000Z

Thanks @yruslan , I need record size for each record.

Answer 4 · 2021-09-27T12:11:14.000Z

@yruslan , could you please help here

Answer 5 · 2021-10-05T07:33:32.000Z

Hi, sorry for the late reply. Currently, this is not supported. I've added this to feature requests.
We can make

.option("generate_record_id", "true")

generate record length as well.