LSSTDESC/rail_base

Inconsistent import paths for sibling classes across rail* repos

Opened this issue · 9 comments

aimalz commented

Related to #19, there are inconsistencies among import locations for analogous functionality between the rail repositories. For example, rail_pzflow defines the FlowHandle in src/rail/tools/flow_handle.py but the other DataHandle subclasses in rail_base are in src/rail/core/data.py. This issue is for establishing consistent locations for analogous subclasses of the same parent class that will be present in the standalone repos, either the way we already do for src/rail/creation/engines and src/rail/estimation/algos (and the same for evaluation once #14 is completed) or another way that's compatible with the namespace setup.

aimalz commented
  • rail_astro_tools/src/rail/tools/utilPhotometry.py (whose contents I'd expect to find in src/rail/core/photometry_utils.py)
  • rail_astro_tools/src/rail/creation/degradation (which for consistency with src/rail/creation/engines, src/rail/estimation/algos, and src/rail/evaluation/metrics should probably be called rail_astro_tools/src/rail/creation/degraders)
  • rail_bpz/src/rail/bpz/utils.py (with actual content that may or may not belong in rail_bpz/src/rail/estimation/algos/_bpz_utils.py, although the self-import I don't understand is a hint that it's related to namespace magic and might not be subject to this issue at all)
  • rail_fsps/src/rail/added_examples/sed_gen_fsps-demo.ipynb (which I think can straightforwardly be moved to rail/examples/creation_examples/sed_gen_fsps-demo.ipynb as per LSSTDESC/rail#48)
  • rail_fsps/src/rail/creation/sed_generation/generator.py (which is an analogue to Creator so would should be at the same level as rail_base/src/rail/creation/engine.py and suggests that the contents of rail_base/src/rail/creation/engine.py should be split into separate modules; also, it would be more consistent if the current generator.py were called sed_generator.py)
  • rail_fsps/src/rail/creation/sed_generator.py (which belongs in src/rail/engines; also, it would be more consistent if the current sed_generator.py were called fsps_gen.py or something like that)
  • rail_pzflow/src/rail/tools/flow_handle.py (I'm not totally clear on why rail_base's ModelDict/ModelHandle aren't sufficient for this, but if we anticipate standalones to have to define their own DataHandles, we should group them in rail_base/src/rail/core/handles, starting with the current DataHandle subclasses in rail_base/src/rail/core/data.py.)

Ok, it looks like the this has all been address in other PRs.

aimalz commented

To recap, it looks like the status is as follows (and please correct me if I'm wrong here):

  • rail_astro_tools/src/rail/tools/utilPhotometry.py --> rail_astro_tools/src/rail/core/photometry_utils.py is completed
  • rail_fsps sounds like there are no objections to the proposed changes. In fact, @torluca already took a stab at most of them, and the remaining ones might be blocked by some FSPS-specific data issues IIRC. I'll follow up and make an issue if any actionable consistency improvements remain.

The remaining items are related to extensibility of the code to more underlying algos so their necessity largely depends on how likely we think it is that these will come up again as the railiverse expands.

  • rail_pzflow/src/rail/tools/flow_handle.py -- sorry for not being clearer in my original description! I wasn't suggesting moving content from the standalones into rail_base, because that would indeed make no sense. The proposal is to create a src/rail/core/handles directory in rail_base and the other repos that define custom datahandles, to move the current datahandles defined in rail_base/src/rail/core/data.py to separate modules in that new path in the rail_base repo, and to move custom datahandles from other repos into that path within their own repo. The motivation for this change is that new algorithms in their standalone repos are fairly likely to need custom datahandles, so it should be straightforward to find and access them under the namespace setup, particularly since datahandle types are very visible to users when troubleshooting pipelines.
  • rail_bpz/src/rail/bpz/utils.py -- thanks for the confirmation of how it works! I think this comes down to whether we expect other algorithms to also need special structure for handling paths, or if it's just BPZ, and, if it is going to come up again, how user-facing it is. My sense is that if there wasn't a way to eliminate this path issue for BPZ, there probably won't be a way to do it for other algos that need specialized input files in hardcoded directories (for relevance, I think this is related to though perhaps not exactly the same as what's come up with FSPS recently), so we'd benefit from setting ourselves up for generalizability sooner rather than later. The fact that its presence in demos is stated in #21 as a reason why the change could be disruptive is a strong indication that it's sufficiently user-facing to merit consideration (and if the "disruption" is isolated to the demos, I'm happy to take care of it). A src/rail/core/path_utils directory would be the most general approach (as in the above proposal for a common import path to custom datahandles) if we have to worry about users being able to figure out how to access this kind of thing for multiple standalone algos.

I'm sorry, but it looks to me like both of these changes would be highly disruptive and I'm not seeing that we would gain much. Perhaps someone else can weigh in.

To wit, 1) I don't see a problem with having a number of rail_xxx/src/rail/xxx/utils.py files that set specific things for specific packages.
2) moving where the data handles live would break literally everything. It just doesn't seem worth it to me.

My two cents:
-I haven't looked at the fps structure very much, so I can't say much on that one.

-Do we really see lots of custom data handles being a big issue? I did not expect a huge number of those, so the flow_handle thing seems a bit like overkill to me, particularly from a cost/benefit analysis standpoint.

-For the bpz_utils, I think this is fairly unique to BPZ, it was done this way because the existing code handled file I/O with paths relative to an environment variable base path, and doing it this way was minimally disruptive to that existing code. So, I think it will not be a common occurrence in the future, and we can just let this one instance sit as-is and not worry about it too much.

I am moving this out of the v1 tracker since we have decided not to make this change before v1