evogytis/baltic

Export back to NextStrain JSON

Closed this issue · 3 comments

I am working on a project where I am trying to add some additional information to the leaves of a Nextstrain JSON tree. Your package is amazing for parsing that format, but I am having a hard time understanding how to get the export back to JSON format from baltic.baltic.tree object format. Is there a function or method for that?

Hi Mike,

This is something that @sidneymbell and I were talking about a few days ago. There's currently nothing in place within the repo itself, but I've written this snippet to do it semi-manually:

from datetime import datetime as dt
import json

nexus_tree=bt.loadNexus('/mnt/c/Users/evogytis/Downloads/aln_03_0816.mcc.tree')
nexus_tree.treeStats()

def convertToJSON(node,index,most_recent_tip):
    json_node={'name': None, 
               'node_attrs': {'num_date': {'value': node.absoluteTime}}, 
               'branch_attrs': {}
              }
    if 'height_95%_HPD' in node.traits: ## height 95% HPD available, compute from most recent tip date
        lower,upper=node.traits['height_95%_HPD']
        time_range = [most_recent_tip-upper, most_recent_tip-lower]
        json_node['node_attrs']['num_date']['confidence']=time_range
    if node.branchType=='node': ## node
        json_node['children']=[] ## has children
        json_node['name']='NODE_%07d'%(index) ## different name
        for child in node.children: ## iterate over children
            if child.branchType=='node': index+=1 ## increment index if child is node too
            index,json_child=convertToJSON(child,index,most_recent_tip) ## get the json-formatted child
            json_node['children'].append(json_child) ## attach resulting json-formatted children to current json node
    else:
        json_node['node_attrs']['country']={'value': node.name.split('|')[2], 'confidence': {node.name.split('|')[2]: 1.0}}
        json_node['node_attrs']['location']={'value': node.name.split('|')[3], 'confidence': {node.name.split('|')[3]: 1.0}}
        json_node['name']=node.name.split('|')[0] ## leaf, name is simple
    return index,json_node

def toNextstrainJSON(tree,output):
    out_file=open(output,'w')
    _,json_tree=convertToJSON(tree.root,0,tree.mostRecent)
    output_json={'version': 'v2', 
                 'meta': {'updated': '%s'%(dt.strftime(dt.now(),'%Y-%m-%d')),  
                          'colorings': [{'key': 'country', 'title': 'Country', 'type': 'categorical'}, 
                                        {'key': 'location', 'title': 'Location', 'type': 'categorical'}], 
                          'panels': ['tree'], 
                          'display_defaults': {'color_by': 'country', 
                                               'distance_measure': 'num_date', 
                                               'geo_resolution': 'country', 
                                               'map_triplicate': 'true'}, 
                          'filters': ['country','location']
                         }, 
                 'tree': json_tree}
    json.dump(output_json,out_file,indent=1)
    out_file.close()

out='/mnt/c/Users/evogytis/Downloads/aln_03_816.json'
toNextstrainJSON(nexus_tree,out)

Hopefully it's clear enough to adapt to your own case but if not - let me know. I intend to include some way of exporting auspice JSON files in the future but it's not a priority at the moment.

@evogytis Amazing! Thanks much for this code snippet, and prompt reply.

To give a bit more background: we are trying to visualize mutations for each sample on the tree. We have the back end for the visualization worked out, but need total mutations as a key:value for each leaf to provide the data to drive things. I have the tree traversal worked out with the newick formatted tree and muts_nt.json. I am trying to find an easy way to make the last step to add that information back to the Auspice JSON. So, I am going from Nextstrain JSON, modifying each leaf, and then back to Nextstrain. I was having a terrible time parsing the JSON format until I hit upon baltic. This is an amazing set of tools!

So, I think my case might be easier than even the snippet you provided. I should be able to work from your example. I will close for now, and if I hit an issue I will re-open the ticket.

In case anyone comes across this issue and needs nextstrain to nextstrain. Here is the code snippet I modified from above.

import baltic as bt
import json

def convertToJSON(node,index,most_recent_tip):
    json_node={'name': None, 
               'node_attrs': {'num_date': {'value': node.absoluteTime}}
              }

    if node.branchType=='node': ## node
        json_node['children']=[] ## has children
        json_node['name']=node.name ## different name
        json_node['node_attrs']=node.traits['node_attrs']

        try: 
            json_node['branch_attrs']=node.traits['branch_attrs']
        except: 
            pass

        for child in node.children: ## iterate over children
            if child.branchType=='node': index+=1 ## increment index if child is node too
            index,json_child=convertToJSON(child,index,most_recent_tip) ## get the json-formatted child
            json_node['children'].append(json_child) ## attach resulting json-formatted children to current json node
    else:
        json_node['node_attrs']=node.traits['node_attrs']
        json_node['branch_attrs']=node.traits['branch_attrs']
        json_node['name']=node.name
#     make a change to specific leaf node here. 

    return index,json_node

def toNextstrainJSON(tree,meta,output):
    out_file=open(output,'w')
    _,json_tree=convertToJSON(tree.root,0,tree.mostRecent)
    output_json={'meta': meta, 
                 'tree': json_tree,
                 'version': 'v2'} #
    json.dump(output_json,out_file,indent=1)
    out_file.close()


nextstrainPath='<PATH>/ncov_example.json'
myTree, myMeta = bt.loadJSON(nextstrainPath)

out='auspice_modified_output.json'
toNextstrainJSON(myTree,myMeta,out)