flow-php/etl

CSV's having "" removed

xswirelab opened this issue · 6 comments

Hi there,

I'm using .csv files with encapsulated headers and values, like "Header", "Header2" etc,
But for some reason all the "" get stripped off. Any ideas why this happens?

Hey, could you maybe provide some simple example of csv file and what result do you expect after reading it so I can look into this?

Example of inputs:

"Company name","Contact owner","First Name","Last Name","Email","Phone Number"
"Some Inc","Dave Young","Dave","Yung","integer@example.com","(312) 768-3103"
"Company name","Contact owner","First Name","Last Name","Email","Phone Number"
"Some Inc","Dave Young",",",",","integer@example.com","(312) 768-3103"

Example of outputs:

Company name,Contact owner,First Name,Last Name,Email,Phone Number
Some Inc,Dave Young,,,,,integer@example.com,(312) 768-3103
Company name,Contact owner,First Name,Last Name,Email,Phone Number
Some Inc,Dave Young,,,,,integer@example.com,,

This happens all the time, even with the most minimum flows.

CSV::from(
Stream::local_file($input)
)

->unpack row

CSV::to(
Stream::local_file($output)
)

This is pretty much how CSV works.

When flow is loading CSV file into memory it's using PHP function fgetcsv. It's that function that is removing " and when you are dumping output to a regular file only values/columns with space will be surrounded by " which is the default enclosure.

Input

"Company name","Contact owner","First Name","Last Name","Email","Phone Number"
"Some Inc","Dave Young","Dave","Yung","integer@example.com","(312) 768-3103"

Code

<?php

use Flow\ETL\DSL\CSV;
use Flow\ETL\DSL\Stream;
use Flow\ETL\DSL\Transform;
use Flow\ETL\Flow;

require __DIR__ . '/../vendor/autoload.php';

(new Flow())
    ->read(CSV::from(Stream::local_file(__DIR__ . '/issue289.csv')))
    ->rows(Transform::array_unpack('row'))
    ->drop("row")
    ->write(CSV::to(Stream::local_file(__DIR__ . '/issue289_new.csv'), true, false, ',', "'"))
    ->run();

Output

'Company name','Contact owner','First Name','Last Name',Email,'Phone Number'
'Some Inc','Dave Young',Dave,Yung,integer@example.com,'(312) 768-3103'

For the better contrast, I changed " into '. As you see, single word values/columns are not surrounded by ' but this is expected behavior driven by fputcsv

Could you maybe elaborate on why do you want to keep enclosure around single word string values?

Also if you would like to just move file, line by line, from one place to another, you can use Text adapter that was just released. It will read file line by line and then write it line by line, by default not changing/removing anything.

I'm closing this issue for now, if my answer did not explain the reported behavior and you are looking for something else please let me know.