tefra/xsdata

Parsing XML to a class: A sequence of alternatives is wronly parsed

DareDevilDenis opened this issue · 2 comments

Using:

  • xsdata 24.4
  • Python 3.11.5

I ran xsdata generate my_schema.xsd on the following schema:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified" xmlns:vc="http://www.w3.org/2007/XMLSchema-versioning" vc:minVersion="1.1">
  <xs:element name="RootNode">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="Field" minOccurs="2" maxOccurs="2">
          <xs:alternative test="@name='LeafType1'" type="LeafType1" />
          <xs:alternative test="@name='LeafType2'" type="LeafType2" />
        </xs:element>
      </xs:sequence>
    </xs:complexType>
  </xs:element>

  <xs:complexType name="LeafType1">
    <xs:simpleContent>
      <xs:extension base="xs:unsignedByte">
        <xs:attribute name="name" fixed="LeafType1"/>
      </xs:extension>
    </xs:simpleContent>
  </xs:complexType>
  
  <xs:complexType name="LeafType2">
    <xs:simpleContent>
      <xs:extension base="xs:unsignedByte">
        <xs:attribute name="name" fixed="LeafType2"/>
      </xs:extension>
    </xs:simpleContent>
  </xs:complexType>
</xs:schema>

This created my_schema.py (this looks correct):

from dataclasses import dataclass, field
from typing import List, Optional, Union


@dataclass
class LeafType1:
    value: Optional[int] = field(
        default=None,
        metadata={
            "required": True,
        },
    )
    name: str = field(
        init=False,
        default="LeafType1",
        metadata={
            "type": "Attribute",
        },
    )


@dataclass
class LeafType2:
    value: Optional[int] = field(
        default=None,
        metadata={
            "required": True,
        },
    )
    name: str = field(
        init=False,
        default="LeafType2",
        metadata={
            "type": "Attribute",
        },
    )


@dataclass
class RootNode:
    field_value: List[Union[LeafType1, LeafType2]] = field(
        default_factory=list,
        metadata={
            "name": "Field",
            "type": "Element",
            "min_occurs": 2,
            "max_occurs": 2,
        },
    )

However when I parsed the following input XML:

<RootNode>
    <Field name="LeafType1">1</Field>
    <Field name="LeafType2">2</Field>	  
</RootNode>

Using the following code:

import pprint
from pathlib import Path
from xsdata.formats.dataclass.parsers import XmlParser
from xsdata.formats.dataclass.parsers.handlers import XmlEventHandler
from my_schema import RootNode


input_xml_path = Path(__file__).parent / "input.xml"
parser = XmlParser(handler=XmlEventHandler)
deserialized = parser.parse(input_xml_path, RootNode)
pprint.pp(deserialized)

The output is wrong - we get LeafType1 twice:

RootNode(field_value=[LeafType1(value=1, name='LeafType1'),
                      LeafType1(value=2, name='LeafType1')])

Here are the files: xsdata_issue_1012.zip

Thanks for reporting @DareDevilDenis the fix is on main!

xml = """<RootNode>
    <Field name="LeafType1">1</Field>
    <Field name="LeafType2">2</Field>	  
</RootNode>"""

parser = XmlParser()
result = parser.from_string(xml)
print(result)

RootNode(field_value=[LeafType1(value=1, name='LeafType1'), LeafType2(value=2, name='LeafType2')])

Thanks @tefra for the very quick fix! 👍