Decouple most methods from from LinkML dataclasses, split sssom into multiple packages
matentzn opened this issue · 6 comments
Right now, we have the odd situation of needing pandas and linkml for parsing a dataframe.
There is no inherent need for that;
- No methods other than
convert
(which relies on LinkML for the translations into JSON and YAML etc) andvalidate
(which relies on linkml during validation) really needs LinkML data classes (pydantic or otherwise). We could contemplate to get rid of the "LinkML part" for these other methods. - LinkML convert and validate do not require pandas technically speaking. It may be worth exploring more efficient means of parsing data frames into dataclasses (at least a proposal here in the issue we can discuss).
This separation could also be grounds into splitting the project into sssom-transform
(validate and convert, dependent on linkml), sssom-ext
(anything not in the other categories, including query stuff) and sssom-developer
(everything needed for sssom file maintenance).
We should still release the whole sssom with everything in it though (as is), wrapping the above.
Its not great to separate packages by heavyweight dependencies, but I had been getting complains about the huge number of dependencies on sssom toolkit, this could help reducing it for some users.
no please please don't split the project up. All of these other modern projects with 1million subpackages are totally impossible to navigate. We just have to better organize what we have, and hide imports from stuff we don't want to always be around
We just have to better organize what we have
How do we avoid massive dependencies when they are not needed? Making them optional and telling users they need to install them if they want to use this and that functionality?
See #467
The rest of the dependencies seem pretty reasonable. Most people have pandas/numpy and this isn't a big ask to get it around. I think having RDFlib is also pretty standard. Then there are some low-level things like validators
, pyyaml
, click
, and deprecation
which are reasonable. curies
is a core component for anyone in semantic world.
One question would be is it possible to make LinkML an optional dependency, since it's the cause of most of the heaviness
Alright I will go with your recommendation for now and see what do do about LinkML separately. For now, I want to try updating sssompy to pydantic data classes and see how much that breaks
@matentzn can linkml generate Pydantic classes that don't have all of the baggage of the yaml system?
I think so, but I am far out of my depth here.