I'm confused about --with-rownames
Closed this issue · 18 comments
It's a bit of stream of consciousness, but there goes.
I have this table, and I want all the rows where the first column (bid) is 2.
$ cat /tmp/table
#"bid" "cid" "length"
1 a 4658
1 b 12060
2 c 5858
2 d 5626
3 e 18451
OK, since I'm failing at awk, let's try bio-table.
$ bio-table --num-filter "values[0]==2" /tmp/table
bio-table 0.0.5 Copyright (C) 2012 Pjotr Prins <pjotr.prins@thebird.nl>
INFO bio-table: Array: [{:show_help=>false, :write_header=>true, :skip=>0, :num_filter=>"values[0]==2"}, ["/tmp/table"]]
#"bid" "cid" "length"
Hmm, no rows. Oh it says in the readme that
The filter ignores the header row, and the row names. If you need either, use the switches --with-header and --with-rownames.
No wonder. Trying this again (although row and names are 2 words, so why isn't it --with-row-names
? Eh.)
$ bio-table --with-rownames --num-filter "values[0]==2" /tmp/table
bio-table 0.0.5 Copyright (C) 2012 Pjotr Prins <pjotr.prins@thebird.nl>
INFO bio-table: Array: [{:show_help=>false, :write_header=>true, :skip=>0, :with_rownames=>true, :num_filter=>"values[0]==2"}, ["/tmp/table"]]
#"bid" "cid" "length"
["\"cid\"", "\"length\""]
["1", "a", "4658"]
/home/ben/.rvm/gems/ruby-1.9.3-p125/gems/bio-table-0.0.5/lib/bio-table/validator.rb:20:in `throw': uncaught throw "Number of fields diverge in line 1 (size 3, expected 2)" (ArgumentError)
...
Hmm, that ain't right. Seems like a bug, should do the honourable thing and report it. I really do dislike awk.
bio-table wants all records to be the same size, incl. the header. Just add a field in the header row.
#"bid" "cid" "length"
doesn't count as header size 3?
Or maybe it is the remark # symbol which confuses Ruby's csv reader. This should work:
"bid" "cid" "length"
1 a 4658
1 b 12060
2 c 5858
2 d 5626
3 e 18451
make it either tab delimited for both header and rows, or CSV. rowname - ah, bit of R there in the naming.
I'll keep it as bug.
No luck with this:
"bid" "cid" "length"
1 a 4658
1 b 12060
2 c 5858
2 d 5626
3 e 18451
Also quotes makes no difference
$ cat /tmp/table3
bid cid length
1 a 4658
1 b 12060
2 c 5858
2 d 5626
3 e 18451
I'll keep it as bug.
ta
You sure it is tab-delimited? Works for me. In vi
:s%s/ +/^I/g
That's gobblygook to an emacs person like myself, but anyway it was tabs and it also doesn't work with commas
$ cat /tmp/table4
bid,cid,length
1,a,4658
1,b,12060
2,c,5858
2,d,5626
3,e,18451
$ bio-table --in-format csv --with-rownames --num-filter "values[0]==2" /tmp/table4
bio-table 0.0.5 Copyright (C) 2012 Pjotr Prins <pjotr.prins@thebird.nl>
INFO bio-table: Array: [{:show_help=>false, :write_header=>true, :skip=>0, :in_format=>:csv, :with_rownames=>true, :num_filter=>"values[0]==2"}, ["/tmp/table4"]]
bid cid length
["cid", "length"]
["1", "a", "4658"]
/home/ben/.rvm/gems/ruby-1.9.3-p125/gems/bio-table-0.0.5/lib/bio-table/validator.rb:20:in `throw': uncaught throw "Number of fields diverge in line 1 (size 3, expected 2)" (ArgumentError)
Same on current master too 71e7286
I used to be an emacs guy.
With CSV it looks like it misses the top-left corner too, if you don't use quotes. Hey, this is version 0.0.5! But usable.
Hey, this is version 0.0.5! But usable.
I'm a pot, so ain't callin' you anything.
I have added a configurable line splitter (a string or regex can be passed in):
./bin/bio-table test/data/input/table_split_on.txt --in-format split --split-on ','
bio-table 0.0.6-rc1 Copyright (C) 2012 Pjotr Prins pjotr.prins@thebird.nl
INFO bio-table: Array: [{:show_help=>false, :write_header=>true, :skip=>0, :in_format=>:split, :split_on=>","}, ["test/data/input/table_split_on.txt"]]
bid cid length
1 a 4658
1 b 12060
2 c 5858
2 d 5626
3 e 18451
Sounds good - I was wondering whether non tab, csv types e.g. space separated would be soon supported as well. So you've saved yourself another pesky github issue. Thanks.
Released in 0.0.6
wwood: can you confirm this work for you?
Sorry, no. Using latest master or 0.0.6
ben@uyen:20120901:~/bioinfo/NLStradamis$ cat /tmp/table4
bid,cid,length
1,a,4658
1,b,12060
2,c,5858
2,d,5626
3,e,18451
ben@uyen:20120901:~/bioinfo/NLStradamis$ ~/git/bioruby-table/bin/bio-table --in-format csv --with-rownames --num-filter "values[0]==2" /tmp/table4
bio-table 0.0.7-rc2 Copyright (C) 2012 Pjotr Prins <pjotr.prins@thebird.nl>
INFO bio-table: Array: [{:show_help=>false, :write_header=>true, :skip=>0, :in_format=>:csv, :with_rownames=>true, :num_filter=>"values[0]==2"}, ["/tmp/table4"]]
INFO bio-table: Array: ["bid", "cid", "length"]
bid cid length
["cid", "length"]
["1", "a", "4658"]
/home/ben/git/bioruby-table/lib/bio-table/validator.rb:20:in `throw': uncaught throw "Number of fields diverge in line 1 (size 3, expected 2)" (ArgumentError)
from /home/ben/git/bioruby-table/lib/bio-table/validator.rb:20:in `valid_row?'
from /home/ben/git/bioruby-table/lib/bio-table/table_apply.rb:56:in `parse_row'
from /home/ben/git/bioruby-table/lib/bio-table/tableload.rb:34:in `block (2 levels) in emit'
from /home/ben/git/bioruby-table/lib/bio-table/tableload.rb:18:in `each'
from /home/ben/git/bioruby-table/lib/bio-table/tableload.rb:18:in `each_with_index'
from /home/ben/git/bioruby-table/lib/bio-table/tableload.rb:18:in `block in emit'
from /home/ben/git/bioruby-table/bin/bio-table:251:in `each'
from /home/ben/git/bioruby-table/bin/bio-table:251:in `each'
from /home/ben/git/bioruby-table/bin/bio-table:251:in `block in <main>'
from /home/ben/git/bioruby-table/bin/bio-table:244:in `each'
from /home/ben/git/bioruby-table/bin/bio-table:244:in `<main>'
Same deal with the tabs.
try again with --in-format split --split-on ','
testing... yes, it is a bug.
Applied fix on master
works on current master for tabs, csv, --in-format split --split-on ','
. Cool.