pdvrieze/xmlutil

How to parse a tag which can have multiple names in a single property

sdipendra opened this issue · 4 comments

How to parse a tag which can have multiple names.

Specifically for example:
For a tag named "link:Stat"

some of my XML documents have fully qualified name: <link:Stat></link:Stat>
some of my XML documents just have: <Stat></Stat> without the namespace

No document has both formats.

I want them to be mapped to the same single property: val stat: Stat

How can I achieve this? Thanks!

There are two approaches. One is currently broken (I've fixed it): adding a custom handler for unknown content in the policy. The other is to have a filter on the parser that just remaps tags. A final option for your case is to override the mechanism by which the policy maps kotlin types to tag names. This is global, but can allow you to use the same serializer with a different policy to parse either.

For the third approach, I'm trying to override the policy behaviour but I'm unable to identify the method that I should override.

I've created a failing test setup for the same if you can point the policy method that I should override that will be great.

In the current setup the first test case with prefix passes & the second test case without prefix fails.

package com.kodepad.xml

import kotlinx.serialization.Serializable
import kotlinx.serialization.decodeFromString
import nl.adaptivity.xmlutil.ExperimentalXmlUtilApi
import nl.adaptivity.xmlutil.serialization.DefaultXmlSerializationPolicy
import nl.adaptivity.xmlutil.serialization.XML
import nl.adaptivity.xmlutil.serialization.XmlElement
import nl.adaptivity.xmlutil.serialization.XmlSerialName
import nl.adaptivity.xmlutil.serialization.XmlSerializationPolicy
import nl.adaptivity.xmlutil.serialization.XmlValue
import org.junit.jupiter.api.Test
import org.slf4j.LoggerFactory
import kotlin.test.assertEquals

@OptIn(ExperimentalXmlUtilApi::class)
internal class XMLUtilFailingTest {
    @Serializable
    @XmlSerialName(
        namespace = "http://www.kodepad.com/xml/equipment",
        prefix = "equipment",
        value = "device",
    )
    data class Device(
        @XmlElement(value = true) val stat: Stat?,
    )

    @Serializable
    @XmlSerialName(
        namespace = "http://www.kodepad.com/xml/link",
        prefix = "link",
        value = "Stat",
    )
    data class Stat(
        @XmlValue val value: String,
    )

    class XmlSerializationPolicyProxy(xmlSerializationPolicy: XmlSerializationPolicy) :
        XmlSerializationPolicy by xmlSerializationPolicy {
        // todo: Override method to map "Stat" to "link:Stat"
    }

    companion object {
        private val log = LoggerFactory.getLogger(this::class.java.declaringClass.name)

        private val expectedValue = Device(Stat("WORKING"))
    }

    private val xml = XML {
        this.policy = XmlSerializationPolicyProxy(
            DefaultXmlSerializationPolicy(
                false, encodeDefault = XmlSerializationPolicy.XmlEncodeDefault.NEVER
            )
        )
    }

    @Test
    fun `parse xml with prefix`() {
        val xmlString =
            "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n" + "<equipment:device xmlns:equipment=\"http://www.kodepad.com/xml/equipment\"\n" + "                  xmlns:link=\"http://www.kodepad.com/xml/link\">\n" + "    <link:Stat>WORKING</link:Stat>\n" + "</equipment:device>\n"

        val device = xml.decodeFromString<Device>(xmlString)
        log.info("device: $device")

        assertEquals(expectedValue, device)
    }

    @Test
    fun `parse xml without prefix`() {
        val xmlString =
            "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n" + "<equipment:device xmlns:equipment=\"http://www.kodepad.com/xml/equipment\"\n" + "                  xmlns:link=\"http://www.kodepad.com/xml/link\">\n" + "    <Stat>WORKING</Stat>\n" + "</equipment:device>\n"

        val device = xml.decodeFromString<Device>(xmlString)
        log.info("device: $device")

        assertEquals(expectedValue, device)
    }
}

Included dependencies:

plugins {
    kotlin("jvm") version "1.8.20"
    kotlin("plugin.serialization") version "1.8.20"
}

dependencies {
    // Serialization
    implementation("org.jetbrains.kotlinx:kotlinx-serialization-json:1.5.0")
    implementation("io.github.pdvrieze.xmlutil:core:0.86.0")
    implementation("io.github.pdvrieze.xmlutil:serialization:0.86.0")
}

Unfortunately there is a bug in the handling (now fixed in dev). What should be overridden is handleUnknownContentRecovering. To see how this works look at:

fun testDeserializeRecoveringWithParser() {
val xml = XML {
policy = object: DefaultXmlSerializationPolicy(true) {
@ExperimentalXmlUtilApi
override fun handleUnknownContentRecovering(
input: XmlReader,
inputKind: InputKind,
descriptor: XmlDescriptor,
name: QName?,
candidates: Collection<Any>
): List<XML.ParsedData<*>> {
XmlSerializationPolicy.recoverNullNamespaceUse(inputKind, descriptor, name)?.let { return it }
return super.handleUnknownContentRecovering(input, inputKind, descriptor, name, candidates)
}
}
}
val input = "<Container><Stat value=\"foo\"/></Container>"
val parsed = xml.decodeFromString<Container>(input)
assertEquals(Container(Stat("foo")), parsed)
}

and:

/**
* Helper function that allows more flexibility on null namespace use. If either the found
* name has the null namespace, or the candidate has null namespace, this will map (for the
* correct child).
*/
@ExperimentalXmlUtilApi
public fun recoverNullNamespaceUse(inputKind: InputKind, descriptor: XmlDescriptor, name: QName?): List<XML.ParsedData<*>>? {
if (name != null) {
if (name.namespaceURI == "") {
for (idx in 0 until descriptor.elementsCount) {
val candidate = descriptor.getElementDescriptor(idx)
if (inputKind.mapsTo(candidate.effectiveOutputKind) &&
candidate.tagName.localPart == name.getLocalPart()) {
return listOf(XML.ParsedData(idx, Unit, true))
}
}
} else {
for (idx in 0 until descriptor.elementsCount) {
val candidate = descriptor.getElementDescriptor(idx)
if (inputKind.mapsTo(candidate.effectiveOutputKind) &&
candidate.tagName.isEquivalent(QName(name.localPart))) {
return listOf(XML.ParsedData(idx, Unit, true))
}
}
}
}
return null
}

But please note that this is broken in master (the helper function is new - but more significantly recovery for elements is broken (it fails to read the end tag))

Checked on dev. This works for my use case. Thank you.

One suggestion though instead of having a specific method for handling null namespace wouldn't it better to have a method that provides ability to map a parsed QName to some other QName. That will enable the null namespace and many other use cases as well.