monarch-initiative/koza

Missing biolink_model_pydantic when calling transform_source

caufieldjh opened this issue · 8 comments

I noticed while testing some ingests for KG-IDG today that after upgrading Koza to 0.1.5 that every test broke due to ModuleNotFoundError: No module named 'biolink_model_pydantic' upon importing transform_source from koza.cli_runner (example stack trace below). This seemed strange as I thought biolink_model_pydantic was a koza dependency - but then I saw it was changed to a dev dependency in #50.
Is biolink_model_pydantic still needed to run the cli_runner functions?

Stack trace:

ImportError while importing test module '/home/runner/work/kg-idg/kg-idg/tests/test_query.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/importlib/__init__.py:127: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/test_query.py:10: in <module>
    from kg_idg.query import parse_query_yaml, result_dict_to_tsv
kg_idg/__init__.py:2: in <module>
    from .transform import transform
kg_idg/transform.py:7: in <module>
    from kg_idg.transform_utils.drug_central.drug_central import DrugCentralTransform
kg_idg/transform_utils/drug_central/__init__.py:1: in <module>
    from .drug_central import DrugCentralTransform
kg_idg/transform_utils/drug_central/drug_central.py:9: in <module>
    from koza.cli_runner import transform_source #type: ignore
/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/koza/cli_runner.py:11: in <module>
    from koza.app import KozaApp
/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/koza/app.py:11: in <module>
    from koza.io.writer.jsonl_writer import JSONLWriter
/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/koza/io/writer/jsonl_writer.py:6: in <module>
    from biolink_model_pydantic.model import Association, Entity, NamedThing
E   ModuleNotFoundError: No module named 'biolink_model_pydantic'

this is tricky because we don't want to tightly couple any specific model to koza, but the kgx jsonl writer needs to know if a class is a node or an edge. We could solve this by adding a private attribute to Association and NamedThing, eg _graph_unit = {'edge', 'node'}, but then this will become a requirement of any model needing to write a nodes and edges file (which maybe makes sense). @kevinschaper curious your thoughts

What if we get rid of write and instead have write_nodes and write_edges?

that could work as well, how would we enforce or check if a node is a node in the biolink model and same for edges? It's tricky because the concept of nodes and edges are not represented explicitly in the model.

For biolink, we could definitely confirm subject/predicate/object properties existing to know it's an association - which I guess means that we could just use that as the check for whether to write an edge or a node.

@glass-ships this seems like a good time to fix this one, since we were just in the writer code. What do you think about rather than using isinstance to use inheritance to figure out if we're writing a node or an edge, maybe instead just checking for a predicate attribute (or maybe subject and object?) to know if it's an edge, and if it doesn't have the necessary attributes to be an edge, then it must be a node?

that sounds like a reasonable check! nodes shouldn't have a relation or predicate property right?
we could definitely take another look at our writer class and tie this into looking at our "core properties"

Oh, that's a great idea! we don't even need to hardcode what properties to check, we can use the core properties of an edge to know if an entity is an edge.

  • Replace biolink-pydantic dependency for type checking edges and nodes