drpowell/FriPan

Bug in importing .proteinortho file

tseemann opened this issue · 2 comments

It is true we need to skip over the first 3 columns to get the main matrix.

However, d3.keys seems to be sorting them or changing the input order. It's possible we've gotten lucky for years because our strain names happened to sort ahead of Alg.-Conn. but a numerical strain like 12228 doesn't. See screenshot below.

parse_proteinortho = (tsv) ->
    strains = []
    values = []
    genes = []
    i=0
    for row in tsv
        i += 1
        if i==1
            # DEBUG console.log row
            # DEBUG console.log d3.keys(row)
            strains = d3.keys(row)[3..] # skip first 3 junk columns
                        .map((s) -> {name: s})
            # DEBUG console.log strains.map((s) -> s.name)
        genes.push( {name:"cluster#{i}", desc:""} )
        values.push( strains.map( (s) -> if row[s.name]=='*' then null else row[s.name]) )

    new GeneMatrix( strains, genes, d3.transpose(values) )

image

The three columns to ignore are named # Species Genes Alg.-Conn.

fixed