tilo/smarter_csv

Able to read only 1 row from file

theRealNG opened this issue ยท 22 comments

Hi,

I have written the following code:

SmarterCSV.process('test.csv', { headers_in_file: true}) do |arr| puts arr end

The following output is printed:
=> {:someid=>"39981", :somename=>"FoodWorks Inc", :somenumber=>"71821", :somedate=>"07/01/2022 3:14"}

But it is printing only the first row apart from the headers. Not sure what am I doing wrong here.

I'm using Ruby 2.7.0 and smarter_csv version is 1.7.0

Seems to be problem with v1.7.0. Tried v1.6.1 and it is working fine.

tilo commented

@theRealNG Can you provide a sample file or a test?
have you tried different row_sep settings?

Example file content:

a,b,c,d
1,2,3,4
5,6,7,8

Running
SmarterCSV.process(file)

Should return
[{:a=>1, :b=>2, :c=>3, :d=>4}, {:a=>5, :b=>6, :c=>7, :d=>8}]
instead it return
{:a=>"1", :b=>"2", :c=>"3", :d=>"4"}

tilo commented

can not reproduce - this sounds like you have either weird row_sep characters in your CSV file, or other special characters

> data = SmarterCSV.process('/tmp/test.csv')
 => [{:a=>1, :b=>2, :c=>3, :d=>4}, {:a=>5, :b=>6, :c=>7, :d=>8}]

@mo-rubikal @theRealNG Have you tried to look at your CSV file with hexdump -C filename or od -X ?

@tilo the version that has the problem is 1.7.0 when I downgraded to the previous version it worked, at the beginning I thought its a separator of file format issue but I was able to reproduce with this simple file from my example above.

tilo commented

@mo-rubikal yes, I understand that rolling-back worked for you.

I need a specific CSV file to reproduce the issue. When I cut+pasted your sample, I was not able to reproduce it, as shown in the snippet above.

Could you either share an exact file, or add a test that shows what is broken?

running with verbose: true could also shed some light on the issue

tilo commented
$ hexdump -C /tmp/test.csv
00000000  61 2c 62 2c 63 2c 64 0a  31 2c 32 2c 33 2c 34 0a  |a,b,c,d.1,2,3,4.|
00000010  35 2c 36 2c 37 2c 38 0a                           |5,6,7,8.|
00000018

even without the last 0A it reads both rows correctly ๐Ÿค”

I can confirm the bug.

It's the same for me, downgrading to 1.6 fixed.

Here's the file that I've used:

booksellers2.csv

tilo commented

very bizarre ... which Ruby version are you using?

> md5sum ~/Downloads/booksellers2.csv
98089c12c4487ca7cadf2fd3f92d477d  /Users/tilo/Downloads/booksellers2.csv
> wc -l  ~/Downloads/booksellers2.csv
     279 /Users/tilo/Downloads/booksellers2.csv

so one header and 278 rows of data

Version 1.6.1

> RUBY_VERSION
 => "2.7.5"
> require 'smarter_csv'
 => true
> SmarterCSV::VERSION
 => "1.6.1"
> data_1_6_1 = SmarterCSV.process('/Users/tilo/Downloads/booksellers2.csv')
 => [{:id=>117}, {:id=>7}, {:id=>8}, {:id=>290, :origi=>"Oui", :ma=>"Oui", :kd=>"Ouii"}, {:id=>61, :origi=>"Oui", :ma=>"Oui", :kd=>...
> data_1_6_1.first
 => {:id=>117}
> data_1_6_1.last
 => {:id=>49, :origi=>"Oui", :ma=>"Oui", :kd=>"Non"}
> data_1_6_1.size
 => 278
> File.open('/tmp/data_1_6_1', 'w') { |f| f.puts data_1_6_1.inspect }

Version 1.7.0

> RUBY_VERSION
 => "2.7.5"
> require 'smarter_csv'
 => true
> data_1_7_0 = SmarterCSV.process('/Users/tilo/Downloads/booksellers2.csv')
 => [{:id=>117}, {:id=>7}, {:id=>8}, {:id=>290, :origi=>"Oui", :ma=>"Oui", :kd=>"Ouii"}, {:id=>61, :origi=>"Oui", :ma=>"Oui", :kd=>...
> data_1_7_0.first
 => {:id=>117}
> data_1_7_0.last
 => {:id=>49, :origi=>"Oui", :ma=>"Oui", :kd=>"Non"}
> data_1_7_0.size
 => 278
> File.open('/tmp/data_1_7_0', 'w') { |f| f.puts data_1_7_0.inspect }
 => nil
> SmarterCSV.has_acceleration?
 => true

and the output is identical for me:

$ md5sum  /tmp/data_1_6_1 /tmp/data_1_7_0
0e9bce174967c97c092b99bb6cb9b9fc  /tmp/data_1_6_1
0e9bce174967c97c092b99bb6cb9b9fc  /tmp/data_1_7_0

@alextakitani what Ruby version are you using, and what does uname -a say?

I get the same results for Ruby 3.0.0

tilo commented

@alextakitani @mo-rubikal @theRealNG Still can not reproduce this.
What OS are you guys using? can you send me the uname -a output, the Ruby version, and maybe another sample file?

hut8 commented

I am getting the same results as #197 (comment).
Ruby 3.1.2.
smarter_csv 1.6.1 works great
1.7 only reads the first row and returns that as a hash, instead of an array of records
Linux computername 5.10.0-15-amd64 #1 SMP Debian 5.10.120-1 (2022-06-09) x86_64 GNU/Linux

I'm having the same problem here.
Changing to v 1.6.1 also fixed it for me.

Ruby 3.0.3

9mm commented

I'm getting the same thing... a csv library that parses a single row. Super useful... ๐Ÿค”

ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-darwin19]

Rails 7.0.3.1

I'm using OSX 10.15.7 (19H1922)

      SmarterCSV.process(file, {chunk_size: 1_000, headers_in_file: true, remove_empty_values: true}) do |chunk|
        chunk.each do |row|
          ...
        end
      end

The sample CSV doesnt matter, it doesnt work with any CSV with headers, comma delimited

9mm commented

Darwin BigWeiner.local 19.6.0 Darwin Kernel Version 19.6.0: Mon Apr 18 21:50:40 PDT 2022; root:xnu-6153.141.62~1/RELEASE_X86_64 x86_6

I am having the same problem.

ruby 3.0.3
rails 7.03.
mac os: Big Sur 11.6.

This config fixed issue for me

remove_empty_values: false,

I am having the same problem.

ruby 2.7.3
rails 6.0.3.2
20.04.1-Ubuntu

Same issue here.
ruby 2.7.1p83 (2020-03-31 revision a0c7c23c9c) [x86_64-linux]
Adding remove_empty_values: false fix it.

tilo commented

Bugfix release 1.7.1 was just published

Please re-evaluate and update this issue if whether fixes the problem or not

tilo commented

TL;DR: the issue only showed up when smarter_csv was used in a Rails project.

the issue is fixed in 1.7.1

@tilo thanks for the fix. Next time, if possible, please don't remove a gem version because of a bug. I would have to remove all my work if I had to remove buggy versions ๐Ÿ˜‚

tilo commented

LOL - good point @matiasalbarello ๐Ÿ˜‚

In this instance, the 1.7.0 version was broken for everybody who is using it from Rails, and I just wanted to make sure people don't run into this issue anymore, hopefully saving them time & frustration