Bug in importing .proteinortho file

Question

Bug in importing .proteinortho file

tseemann opened this issue 8 years ago · 2 comments

It is true we need to skip over the first 3 columns to get the main matrix.

However, d3.keys seems to be sorting them or changing the input order. It's possible we've gotten lucky for years because our strain names happened to sort ahead of Alg.-Conn. but a numerical strain like 12228 doesn't. See screenshot below.

parse_proteinortho = (tsv) ->
    strains = []
    values = []
    genes = []
    i=0
    for row in tsv
        i += 1
        if i==1
            # DEBUG console.log row
            # DEBUG console.log d3.keys(row)
            strains = d3.keys(row)[3..] # skip first 3 junk columns
                        .map((s) -> {name: s})
            # DEBUG console.log strains.map((s) -> s.name)
        genes.push( {name:"cluster#{i}", desc:""} )
        values.push( strains.map( (s) -> if row[s.name]=='*' then null else row[s.name]) )

    new GeneMatrix( strains, genes, d3.transpose(values) )

drpowell commented 8 years ago

fixed

Answer 1 · 2017-03-23T04:23:44.000Z

The three columns to ignore are named # Species Genes Alg.-Conn.