- Original code by Chuan
- Original ensemble-lookup.csv
Running the code provided by Chuan gives the exact same results as reported in his google doc. 59,814 matching predictions and 41,141 different.
In the original code the adata.var.hugo_symbol
is used as the gene_name. This property is not available in datasets from other sources such as CellXGene and GTeX. Instead we use a lookup table to translate ensembl identifiers to gene names. See normalize_var_names for the logic.
After applying this change to the original code in run.py and rerunning the tool, the results matches the output of hra-workflows-runner exactly.