kujaku11/mt_metadata

Exceptions when reading tf in emtfxml format

Closed this issue · 3 comments

04_processing_review_failing_rows.csv
Two errors have appeared since the last widescale test in early June 2023.

  1. An error occurred when the "attachment" field of the xml had value None

This error originates in emtfxml.py around line 308, during the iteration over element_keys. When the element == "attachment" the line attr.read_dict(root_dict) fails.

This occurs on line 32 of attachment.py in the line
helpers._read_element(self, input_dict, "attachment")

The input_dict does have a key "attachment", but the value is None
I suggested a modification to read_dict() in attachment.py to return if input_dict["attachment"] is None.

Traceback (most recent call last):
File "/home/kkappler/software/irismt/mt_metadata/mt_metadata/transfer_functions/io/emtfxml/metadata/helpers.py", line 80, in _read_element
cls.from_dict(element_dict)
File "/home/kkappler/software/irismt/mt_metadata/mt_metadata/base/metadata.py", line 631, in from_dict
meta_dict = helpers.flatten_dict(meta_dict[class_name])
File "/home/kkappler/software/irismt/mt_metadata/mt_metadata/base/helpers.py", line 310, in flatten_dict
for key, value in meta_dict.items():
AttributeError: 'NoneType' object has no attribute 'items'

And a zipped copy of the xml is here
8P_CAR04_RRCAZ12.xml.gz

  1. One of the SPUD xml, (which used to read-in) chokes on the "site" key
    Specifically I think the error occurs when it is parsing the "orientation" info, but the orientation values in the offending "site" dict (pasted below) don't appear different from xml that do read in ... puzzling.

"site": {
"acquired_by": null,
"country": null,
"end": "1980-01-01T00:00:00+00:00",
"id": null,
"location.elevation": 0.0,
"location.latitude": 0.0,
"location.longitude": 0.0,
"name": null,
"orientation.angle_to_geographic_north": 0.0,
"orientation.layout": "orthogonal",
"project": null,
"run_list": "",
"start": "1980-01-01T00:00:00+00:00",
"survey": null,
"year_collected": 1980
}
}

Traceback (most recent call last):
File "/home/kkappler/software/irismt/mt_metadata/mt_metadata/transfer_functions/io/emtfxml/metadata/helpers.py", line 80, in _read_element
cls.from_dict(element_dict)
File "/home/kkappler/software/irismt/mt_metadata/mt_metadata/base/metadata.py", line 631, in from_dict
meta_dict = helpers.flatten_dict(meta_dict[class_name])
File "/home/kkappler/software/irismt/mt_metadata/mt_metadata/base/helpers.py", line 310, in flatten_dict
for key, value in meta_dict.items():
AttributeError: 'str' object has no attribute 'items'

Here is a zip of the XML with the second error
18645148_8P_CAS03.xml.gz

Hint on error 2:
On a file that does read in fine, when we hit the function _read_element in metadata_helpers.py, the value of element dict is:
element_dict
Out[3]:
{'orientation': OrderedDict([('angle_to_geographic_north', '0.000'),
('value', 'orthogonal')])}

But in the failing case, we have:
element_dict
Out[6]: {'orientation': 'sitelayout'}, and since the value of orienation is not a dict, we get the error, since the string has no items() method.

So, starting from emtfxml.py, when element == site (around line 306),
the attr is ______ and this goes into attr.read_dict()

read_dict loops over site keys, and when it gets to orientation, for the workng case in site.py,
lines ~95-110 we have this output

element
Out[3]: 'orientation'
attr
Out[4]:
{
"orientation": {
"angle_to_geographic_north": 0.0,
"layout": "orthogonal"
}
}

But in the broken case I also see (and cut and paste):

element
Out[6]: 'orientation'
attr
Out[7]:
{
"orientation": {
"angle_to_geographic_north": 0.0,
"layout": "orthogonal"
}
}

In orientation.py, in the read_dict method in the working case, we have:

input_dict
Python 3.10.10 (main, Mar 21 2023, 18:45:11) [GCC 11.2.0]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.12.0 -- An enhanced Interactive Python. Type '?' for help.
PyDev console: using IPython 8.12.0
Out[1]:
OrderedDict([('project', 'USMTArray'),
('survey', 'CONUS SoCal'),
('year_collected', '2019'),
('country', 'USA'),
('id', 'CAR04'),
('name', 'The Meadows Slough, CA, USA'),
('location',
OrderedDict([('latitude', '38.245768'),
('longitude', '-121.489255'),
('elevation', '0.000'),
('declination',
OrderedDict([('epoch', '1995.0'),
('value', '13.200')])),
('datum', 'WGS84')])),
('orientation',
OrderedDict([('angle_to_geographic_north', '0.000'),
('value', 'orthogonal')])),
('acquired_by', 'National Geoelectromagnetic Facility'),
('start', '2019-05-03T22:29:40'),
('end', '2019-05-09T00:25:42'),
('run_list', 'CAR04a'),
('data_quality_notes',
OrderedDict([('rating', '1'),
('good_from_period', '20.000'),
('good_to_period', '400.000'),
('comments',
OrderedDict([('author',
'Adam Schultz and Esteban Bowles-Martinez'),
('value',
'Very poor data quality will require careful removal of large frequency bands before inverting.')]))])),
('data_quality_warnings',
OrderedDict([('flag', '1'),
('comments',
OrderedDict([('author',
'Adam Schultz and Esteban Bowles-Martinez')]))]))])

but in the broken case we have:

input_dict
Out[2]:
OrderedDict([('project', 'USMTArray'),
('survey', 'CONUS SoCal'),
('year_collected', '2019'),
('country', 'USA'),
('id', 'CAS03'),
('name', 'Pleasanton Ridge, CA, USA'),
('location',
OrderedDict([('latitude', '37.658830'),
('longitude', '-121.959752'),
('elevation', '484.050'),
('declination',
OrderedDict([('epoch', '1995.0'),
('value', '15.000')])),
('datum', 'WGS84')])),
('orientation', 'sitelayout'),
('acquired_by', 'National Geoelectromagnetic Facility'),
('start', '2019-10-24T22:07:08'),
('end', '2019-11-08T00:49:32'),
('run_list', 'CAS03a CAS03b'),
('data_quality_notes',
OrderedDict([('rating', '1'),
('good_from_period', '30.000'),
('good_to_period', '100.000'),
('comments',
OrderedDict([('author',
'Adam Schultz and Esteban Bowles-Martinez'),
('value',
'Signal overwhelmed by cultural noise near electric train in San Francisco Bay Area. Very poor data quality that will require careful removal of large frequency bands before inverting.')]))])),
('data_quality_warnings',
OrderedDict([('flag', '1'),
('comments',
OrderedDict([('author',
'Adam Schultz and Esteban Bowles-Martinez')]))]))])

Both now pass with commit cd25167