Bug in importing .proteinortho file
tseemann opened this issue · 2 comments
tseemann commented
It is true we need to skip over the first 3 columns to get the main matrix.
However, d3.keys
seems to be sorting them or changing the input order. It's possible we've gotten lucky for years because our strain names happened to sort ahead of Alg.-Conn.
but a numerical strain like 12228
doesn't. See screenshot below.
parse_proteinortho = (tsv) ->
strains = []
values = []
genes = []
i=0
for row in tsv
i += 1
if i==1
# DEBUG console.log row
# DEBUG console.log d3.keys(row)
strains = d3.keys(row)[3..] # skip first 3 junk columns
.map((s) -> {name: s})
# DEBUG console.log strains.map((s) -> s.name)
genes.push( {name:"cluster#{i}", desc:""} )
values.push( strains.map( (s) -> if row[s.name]=='*' then null else row[s.name]) )
new GeneMatrix( strains, genes, d3.transpose(values) )
tseemann commented
The three columns to ignore are named # Species Genes Alg.-Conn.
drpowell commented
fixed