Multiple paths for source files
MaksymFedorchuk opened this issue · 2 comments
I need to read files from multiple folders, but so far I didn't find in cobrix an option to achieve this. So is there a way to read multiple folders without creating multiple rdd's or datasets? If not, then this should be an enhancement request.
Example :
source_folders = ["example1/folder_1/","example2/folder_2/"]
spark.read.format("cobol").load(source_folders)
Parquet and other popular formats have support for multiple sources
That's a good idea! We'll check it out and implement it
While looking into that I've noticed that in order to support multiple paths in .load(...)
the data source provider needs to be rewritten in terms of FileFormat
instead of RelationProvider
. So it might take some time to implement.
If data files are in subdirectories of the same root folder, you can use "/path/*", and the data source will look 1 level of recursion into each subfolder.
If rewriting fom RelationProvider
to FileFormat
is too hard, we'll add a Cobrix extension option, for instance .option("paths", "/comma,/separated,/paths")
as a workaround for sometime.