Empty line at the end causes cobrix to create 1 more record
MaksymFedorchuk opened this issue · 4 comments
For example we have file like this with empty line at the end :
fdhfhdsfsdff
dfdhjfwdsdd
dsfddkkfkgk
And if I read it by specifying
spark.read
.format("cobol")
.option("is_record_sequence", "true")
.option("is_text", "true")
.option("encoding", "ascii")
.option("copybook", path_to_copybook)
.load(path_to_file)
I get 4 records instead of 3, so is that a bug or it can be fixed by some option?
Can I ask you to attach the test file?
I just want to check if the empty line contains no characters or at least 1 character.
When reading text files Cobrix filters out empty lines, but since Windows uses CR LF line ending characters, and Linux/MacOs uses just LF, it is possible that one character ends up in the last record.
I'll check the file and determine if it is a bug or a feature. It's more likely to be a bug though
I've noticed something interesting. Try removing option("is_record_sequence", "true")
and please let me know if it worked as expected
Bug confirmed. It happens when is_text = true
and is_record_sequence = true