CSV::from_file does not work with remote URLs
shadowhand opened this issue · 1 comments
shadowhand commented
I want to be able to extract documents stored on S3, so I have registered the stream wrapper:
$client = $sdk->createS3();
$client->registerStreamWrapper();
And then I attempt to load the file:
(new Flow())
->extract(CSV::from_file(
file_name: 's3://my-bucket-name/test.csv',
header_offset: 0,
))
// ...
->run();
But the following exception is thrown:
[Flow\ETL\Exception\InvalidArgumentException]
File s3://my-bucket-name/test.csv not found.
I think the check for file existence should be modified to allow remote URLs:
if (! (str_contains($file_name, needle: '://') || is_file($file_name))) { /* throw exception */ }
OR add another method to handle resources, eg:
$extractor = CSV::from_resource(
resource: fopen('s3://my-bucket-name/test.csv'),
header_offset: 0,
);
norberttech commented
hey @shadowhand literally today morning I started looking into this but let me move this issue into the https://github.com/flow-php/etl and convert it into a discussion since it's a longer topic that affects all adapters.