Bytelength
Loganhex2021 opened this issue · 5 comments
Background [Optional]
We are using cobrix library for reading ebcdic file in the Databricks. There is a validation requirement to check record byte length for each record in the file.
Question
Is there any option to generate byte length for the record while reading ebcdic file?
@yruslan - Could you please let me know if you have any idea to calculate byte length for a reach in ebcdic file ?
Do you need a record size for each record or file size for each record?
You can get a file name for each record using either
.option("with_input_file_name_col", "input_file_name")
or
df.withColumn("input_file_name", input_file_name())
depending on the type of file (variable length vs fixed length)
You can then use a filesystem API (Hadoop Client, etc) to get the file size for each file.
Thanks @yruslan , I need record size for each record.
@yruslan , could you please help here
Hi, sorry for the late reply. Currently, this is not supported. I've added this to feature requests.
We can make
.option("generate_record_id", "true")
generate record length as well.