kumc-bmi/naaccr-tumor-data

document git submodule for WerthPADOH/naaccr

Opened this issue · 4 comments

dckc commented

We're using XML files from naaccr-xml and likewise from other places. These are design-time constant artifacts, like Java resources.

In python2, I'd use resource_string relative to the current module:

pkg_resources.resource_string(__name__, 'naaccr_xml/src/main/resources/naaccr-dictionary-180.xml)

In python3, pkg_resources is replaced by importlib.resources. But the migration docs don't show support for the __name__ idiom for in-package references, nor separators (slashes) in the pathname. The recommended idiom is:

contents = importlib_resources.read_binary('my.package', 'resource.dat')

My current approach is to create a new sub-package for each directory where resources reside. This works ok, but I haven't committed the results as it clutters the top level directory with stuff like:

  • naaccr_xml_res/
  • naaccr_xml_samples/
  • naaccr_xml_xsd/

The limitations seem to come from an issue Unable to retrieve resources from a namespace package 68. I don't really understand what a namespace package is, though.

cc @contactlp

dckc commented

... I haven't committed the results as it clutters the top level directory ...

I went ahead and committed the static assets for use in a jenkins job.
7248b23

dckc commented

This approach is a little ugly, but it's working.

dckc commented

reconsidering this in the JVM context

dckc commented

we're getting naaccr-xml dependencies via gradle and the R stuff via a git submodule.

The git submodule should be more clearly documented.