pjotrp/bioruby-table

I'm confused about --with-rownames

Closed this issue · 18 comments

It's a bit of stream of consciousness, but there goes.

I have this table, and I want all the rows where the first column (bid) is 2.

$ cat /tmp/table
#"bid"  "cid"   "length"
1   a   4658
1   b   12060
2   c   5858
2   d   5626
3   e   18451

OK, since I'm failing at awk, let's try bio-table.

$ bio-table  --num-filter "values[0]==2" /tmp/table 
bio-table 0.0.5 Copyright (C) 2012 Pjotr Prins <pjotr.prins@thebird.nl>

 INFO bio-table: Array: [{:show_help=>false, :write_header=>true, :skip=>0, :num_filter=>"values[0]==2"}, ["/tmp/table"]]
#"bid"  "cid"   "length"

Hmm, no rows. Oh it says in the readme that

The filter ignores the header row, and the row names. If you need either, use the switches --with-header and --with-rownames.

No wonder. Trying this again (although row and names are 2 words, so why isn't it --with-row-names? Eh.)

$ bio-table --with-rownames --num-filter "values[0]==2" /tmp/table 
bio-table 0.0.5 Copyright (C) 2012 Pjotr Prins <pjotr.prins@thebird.nl>

 INFO bio-table: Array: [{:show_help=>false, :write_header=>true, :skip=>0, :with_rownames=>true, :num_filter=>"values[0]==2"}, ["/tmp/table"]]
#"bid"  "cid"   "length"
["\"cid\"", "\"length\""]
["1", "a", "4658"]
/home/ben/.rvm/gems/ruby-1.9.3-p125/gems/bio-table-0.0.5/lib/bio-table/validator.rb:20:in `throw': uncaught throw "Number of fields diverge in line 1 (size 3, expected 2)" (ArgumentError)
...

Hmm, that ain't right. Seems like a bug, should do the honourable thing and report it. I really do dislike awk.

bio-table wants all records to be the same size, incl. the header. Just add a field in the header row.

#"bid"  "cid"   "length"

doesn't count as header size 3?

Or maybe it is the remark # symbol which confuses Ruby's csv reader. This should work:

"bid" "cid" "length"
1 a 4658
1 b 12060
2 c 5858
2 d 5626
3 e 18451

make it either tab delimited for both header and rows, or CSV. rowname - ah, bit of R there in the naming.

I'll keep it as bug.

No luck with this:

"bid"   "cid"   "length"
1   a   4658
1   b   12060
2   c   5858
2   d   5626
3   e   18451

Also quotes makes no difference

$ cat /tmp/table3
bid cid length
1   a   4658
1   b   12060
2   c   5858
2   d   5626
3   e   18451

I'll keep it as bug.

ta

You sure it is tab-delimited? Works for me. In vi

:s%s/ +/^I/g

That's gobblygook to an emacs person like myself, but anyway it was tabs and it also doesn't work with commas

$ cat /tmp/table4
bid,cid,length
1,a,4658
1,b,12060
2,c,5858
2,d,5626
3,e,18451
$ bio-table --in-format csv --with-rownames --num-filter "values[0]==2" /tmp/table4
bio-table 0.0.5 Copyright (C) 2012 Pjotr Prins <pjotr.prins@thebird.nl>

 INFO bio-table: Array: [{:show_help=>false, :write_header=>true, :skip=>0, :in_format=>:csv, :with_rownames=>true, :num_filter=>"values[0]==2"}, ["/tmp/table4"]]
bid cid length
["cid", "length"]
["1", "a", "4658"]
/home/ben/.rvm/gems/ruby-1.9.3-p125/gems/bio-table-0.0.5/lib/bio-table/validator.rb:20:in `throw': uncaught throw "Number of fields diverge in line 1 (size 3, expected 2)" (ArgumentError)

Same on current master too 71e7286

I used to be an emacs guy.

With CSV it looks like it misses the top-left corner too, if you don't use quotes. Hey, this is version 0.0.5! But usable.

Hey, this is version 0.0.5! But usable.

I'm a pot, so ain't callin' you anything.

I have added a configurable line splitter (a string or regex can be passed in):

./bin/bio-table test/data/input/table_split_on.txt --in-format split --split-on ','

bio-table 0.0.6-rc1 Copyright (C) 2012 Pjotr Prins pjotr.prins@thebird.nl

INFO bio-table: Array: [{:show_help=>false, :write_header=>true, :skip=>0, :in_format=>:split, :split_on=>","}, ["test/data/input/table_split_on.txt"]]
bid cid length
1 a 4658
1 b 12060
2 c 5858
2 d 5626
3 e 18451

Sounds good - I was wondering whether non tab, csv types e.g. space separated would be soon supported as well. So you've saved yourself another pesky github issue. Thanks.

Released in 0.0.6

wwood: can you confirm this work for you?

Sorry, no. Using latest master or 0.0.6

ben@uyen:20120901:~/bioinfo/NLStradamis$ cat /tmp/table4
bid,cid,length
1,a,4658
1,b,12060
2,c,5858
2,d,5626
3,e,18451

ben@uyen:20120901:~/bioinfo/NLStradamis$ ~/git/bioruby-table/bin/bio-table --in-format csv --with-rownames --num-filter "values[0]==2" /tmp/table4
bio-table 0.0.7-rc2 Copyright (C) 2012 Pjotr Prins <pjotr.prins@thebird.nl>

 INFO bio-table: Array: [{:show_help=>false, :write_header=>true, :skip=>0, :in_format=>:csv, :with_rownames=>true, :num_filter=>"values[0]==2"}, ["/tmp/table4"]]
 INFO bio-table: Array: ["bid", "cid", "length"]
bid cid length
["cid", "length"]
["1", "a", "4658"]
/home/ben/git/bioruby-table/lib/bio-table/validator.rb:20:in `throw': uncaught throw "Number of fields diverge in line 1 (size 3, expected 2)" (ArgumentError)
    from /home/ben/git/bioruby-table/lib/bio-table/validator.rb:20:in `valid_row?'
    from /home/ben/git/bioruby-table/lib/bio-table/table_apply.rb:56:in `parse_row'
    from /home/ben/git/bioruby-table/lib/bio-table/tableload.rb:34:in `block (2 levels) in emit'
    from /home/ben/git/bioruby-table/lib/bio-table/tableload.rb:18:in `each'
    from /home/ben/git/bioruby-table/lib/bio-table/tableload.rb:18:in `each_with_index'
    from /home/ben/git/bioruby-table/lib/bio-table/tableload.rb:18:in `block in emit'
    from /home/ben/git/bioruby-table/bin/bio-table:251:in `each'
    from /home/ben/git/bioruby-table/bin/bio-table:251:in `each'
    from /home/ben/git/bioruby-table/bin/bio-table:251:in `block in <main>'
    from /home/ben/git/bioruby-table/bin/bio-table:244:in `each'
    from /home/ben/git/bioruby-table/bin/bio-table:244:in `<main>'

Same deal with the tabs.

try again with --in-format split --split-on ','

testing... yes, it is a bug.

Applied fix on master

works on current master for tabs, csv, --in-format split --split-on ','. Cool.