flow-php/etl

CSV::from_file does not work with remote URLs

shadowhand opened this issue · 1 comments

I want to be able to extract documents stored on S3, so I have registered the stream wrapper:

$client = $sdk->createS3();
$client->registerStreamWrapper();

And then I attempt to load the file:

(new Flow())
    ->extract(CSV::from_file(
        file_name: 's3://my-bucket-name/test.csv',
        header_offset: 0,
    ))
    // ...
    ->run();

But the following exception is thrown:

 [Flow\ETL\Exception\InvalidArgumentException]                                                    
  File s3://my-bucket-name/test.csv not found.

I think the check for file existence should be modified to allow remote URLs:

if (! (str_contains($file_name, needle: '://') || is_file($file_name))) { /* throw exception */ }

OR add another method to handle resources, eg:

$extractor = CSV::from_resource(
    resource: fopen('s3://my-bucket-name/test.csv'),
    header_offset: 0,
);

hey @shadowhand literally today morning I started looking into this but let me move this issue into the https://github.com/flow-php/etl and convert it into a discussion since it's a longer topic that affects all adapters.