ncss-tech/SoilKnowledgeBase

osd_to_json: rare incorrect labeling in output

brownag opened this issue · 2 comments

Incorrect labeling due to bad order and/or missing groups. The algorithm is likely getting confused somehow, as this is handled correctly even when things are out of order a very high percentage of the time.

It may be that the only way to fix the edgiest of cases is with some sort of post-processing of content versus the parsed results -- but I feel like when I look and it will be something wrong with the implementation.

An example is the ZADE series OSD JSON where everything after Geographic Setting is out of order [offset by one]

"GEOGRAPHIC SETTING": {
"section": "GEOGRAPHIC SETTING",
"content": "GEOGRAPHIC SETTING:\nLandform - hills.\nElevation - 4,900 to 6,200 feet.\nSlope - 15 to 70 percent.\nParent material - interbedded sandstone and shale residuum.\nClimate - long, cold winters; moist springs; cool summers.\nMean annual precipitation - 20 to 24 inches.\nMean annual air temperature - 34 to 38 degrees F.\nFrost-free period - 50 to 70 days."
},
"DRAINAGE AND PERMEABILITY": {},
"USE AND VEGETATION": {
"section": "DRAINAGE AND PERMEABILITY",
"content": "DRAINAGE AND PERMEABILITY: Well drained; moderately slow permeability."
},
"DISTRIBUTION AND EXTENT": {
"section": "USE AND VEGETATION",
"content": "USE AND VEGETATION: Zade soils are used mainly for woodland and wildlife habitat. Potential native vegetation may include Douglas fir with an understory of common snowberry, western meadowrue, pinegrass and heartleaf arnica."
},
"REGIONAL OFFICE": {
"section": "DISTRIBUTION AND EXTENT",
"content": "DISTRIBUTION AND EXTENT: Zade soils are of small extent in southwestern Montana."
},
"ORIGIN": {
"section": "MLRA SOIL SURVEY REGIONAL OFFICE (MO) RESPONSIBLE",
"content": "MLRA SOIL SURVEY REGIONAL OFFICE (MO) RESPONSIBLE: Bozeman, Montana"
},
"REMARKS": {
"section": "SERIES ESTABLISHED",
"content": "SERIES ESTABLISHED: Gallatin County, Montana, 1997."
},
"GEOGRAPHICALLY ASSOCIATED SOILS": {
"section": "GEOGRAPHICALLY ASSOCIATED SOILS",
"content": null
}

If we pull up the OSD nothing pops out as being immediately wrong with it... until you see that GEOGRAPHICALLY ASSOCIATED SOILS is missing. They use a long list-form Competing section -- I suppose instead? -- pretty nice.

https://github.com/ncss-tech/OSDRegistry/blob/main/OSD/Z/ZADE.txt

The code at one point concatenated a list of numeric index vectors -- some of which could be zero length...

w/ 2ea007a vector is properly buffered with NA -- which makes subsequent stuff work right -- I think.

Going to chance it and run the refresh-extdata Action and see what changes... in a branch.