sandialabs/InterSpec

DHS: namespace in output N42s has no definition.

jpbrodsky opened this issue · 3 comments

Interspec v 1.0.10 rc2

I used the Interspec "Export" function to produce a N42 (2012) file. This file looks like this (with some sections removed):

<?xml version="1.0" encoding="utf-8"?> <RadInstrumentData n42DocUUID="92fc70f2-0f2d-4ac3-8018-2cf53b5b7b3c" xmlns="http://physics.nist.gov/N42/2011/N42" n42DocDateTime="2022-06-08T21:09:54Z"> <Remark>Source of intrinsic activity:Cesium137</Remark> <RadInstrumentDataCreatorName>InterSpec</RadInstrumentDataCreatorName> ... <RadMeasurement id="Sample1"> ... </RadMeasurement> ... <DHS:InterSpec version="1"> <DisplayedSampleNumbers>2</DisplayedSampleNumbers> ... </DHS:InterSpec> </RadInstrumentData>

The "DHS" namespace designation in <DHS:InterSpec> is not defined in the document as required by the XML standard (i.e. using an xmlsn=....). This makes the file unparseable by the python package lxml due it the non-compliance with the XML standard. As lxml is based on libxml2, presumably that package will also have trouble parsing these files.

lxml error message:
Namespace prefix DHS on Interspec is not defined, line xx, column yy

I suggest this issue be corrected by defining the DHS namespace in the output xml. While different parsers may be more or less tolerant of this issue, my understanding is that lxml is correct here in objecting to the use of an undefined namespace (even if I wish it might "loosen up a little" and parse the file regardless of this issue).

Thanks for reporting this Jason.

This is an item that has been on my TODO list for quite a while - however, I hadn't been aware of impact to any one (e.g., all the other spectroscopy programs seem to read the files without issues), so it has been low priority; I'll bump it up in priority, but can't promise a date when it will be done by.

A few things worth noting are:

  • All information with the <DHS:InterSpec> tag is InterSpec specific information, so can likely be safely removed using something like sed or a regex (but sorry, I know this is a pain!), for example, the following seems to work:
import re
from lxml import etree
n42file = open( "temp.n42", "r");
n42_data = n42file.read()
n42file.close()
clean_n42_data = re.sub('<DHS[\d\D]+DHS:InterSpec>', '', n42_data )
root = etree.XML( clean_n42_data.encode('utf-8') )
  • The SpecUtils library has python bindings (but you have to compile it from source, and you probably already have your code setup, so maybe to late to switch to using this to parse files)
  • In addition, even excluding the DHS namespace part, I wouldn't be surprised if there could be one or two other small deviations away from the N42 standard, here or there (but at least the XML should be valid).

thanks again,
-will

Thanks, Will!

At this point, we're not planning on modifying our software to specifically support the output of InterSpec, but being able to read it alongside other N42s would be a nice bonus. SpecUtils may be a good solution for that, but for the reasons you mention it's a relatively large job to solve a somewhat small problem.

Best,
Jason

Am6er commented

Or just do something like that, before fix incoming.

            //Add DHS namespace for Interspec compatibility
            XmlDocument xmldoc = new XmlDocument();
            XmlReaderSettings settings = new XmlReaderSettings { NameTable = new NameTable() };
            XmlNamespaceManager xmlns = new XmlNamespaceManager(settings.NameTable);
            xmlns.AddNamespace("DHS", "http://www.w3.org/2001/XMLSchema-instance");
            XmlParserContext context = new XmlParserContext(null, xmlns, "", XmlSpace.Default);
            //Add DHS namespace for Interspec compatibility
            RadInstrumentData radInstrumentData = new RadInstrumentData();
            using (XmlReader reader = XmlReader.Create(filename, settings, context))