XLSForm/pyxform

Generate ODK XForms spec-compliant meta block

Closed this issue · 12 comments

BLOCKED:

  • Enketo likely doesn't support this (see #105 (comment))
  • Servers would need a way to identify/communicate that the meta block namespace has changed through form updates (#105 (comment))

The ODK Xforms spec describes specific node names for metadata and requires that they be in the meta block and in the OpenRosa XForms namespace. The implementations tend to be more lenient but sticking to consistent naming has advantages for downstream analysis tools.

pyxform currently places metadata elements such as start time and end time in the widgets block and allows the user to specify an arbitrary node name. Instead, it would be preferable to generate something like

<orx:meta>
    <orx:deviceID/>
    <orx:timeStart/>
    <orx:timeEnd/>
    <orx:userID/>
    <orx:instanceID/>
</orx:meta>

Indeed! I've split this up into a few points and added a related instanceID syntax improvement:

  • not requiring a name value (and ignoring it if users do add it)
  • wrt to the previous point, add a warning output that the name value was ignored?
  • use the fixed nodeNames as specified in the spec for the individual meta nodes
  • adding the required namespace on both the meta block and the individual meta nodes
  • make sure instanceID is put under the same namespaced meta block
  • don't add a calculate attribute on the instanceID bind, but add a preload="uid" attribute instead

Checking to generate this meta block. Still poking around the code phase.

Good news, @MartijnR's last bullet about the instanceID bind was done: #94

Notes from the breakout session:

Problem

We currently generate the meta tags. According to the standard specifications, we should put them in a different namespace instead of the way we use them now. When we generate them now, we generate them outside of the meta block.

Discussion

  • The challenge is that people have downstream analysis dependent on the current configuration
  • It allows the user to define the name of how this field is made available to the client. So, a user can define what they want to name these.
  • The names come from the xform spec, which implies that they are static. Therefore, the same way you can always rely on the instanceID you can rely on these fields.
  • We found that the xform spec defines it in the meta block.
  • This is disruptive in the existing forms as well as downstream organizations who have analysis scripts. There are many users who use those fields.
  • The bind would look different because you need to change the nodeset to reference the meta tag instead of the user defined nodeset.
  • Kobo and Ona will be impacted with edits. This isn't a problem with new forms, but will be if a user edits using xlsform upload, which is a current feature on these systems.
  • The tools would need to accommodate this.
  • The problem is that we would need to build everything that generates an xform.
  • This is clearly a breaking change, but we would want to eventually phase it out.
  • The data looks for both a meta tag or an orx:meta tag.
  • This may not be an issue for ODK central. The change is due to the tool for the library implementing pyxform

Decision

  1. Make the default for new xforms use the new meta namespace.
  2. Add support for edits using both structures, but work on a process for phasing out this features over time.
  3. Add an explicit warning flag when uploading edits to the library.
  4. Clearly mark this change in release notes so downstream implementers can update their systems.

I think that for this change to be made, the authors of tools that allow form updates would have to be involved in defining the API. That is, there will need to be a way for those systems to communicate what type of meta block they want and I don't know how e.g. Ona or Kobo requests a form conversion.

Users of systems like Aggregate that don't embed pyxform but can read XLSForms will also be affected. I can't recall whether I didn't consider that or didn't find it important. For example, in Aggregate's case, arbitrary form updates aren't allowed but it is possible to make changes that don't affect the schema including adding select options or changing question text.

I think we should leave this aside for now. It's quite disruptive and although it does feel wrong, there's been no user complaint or demand to change it.

As far as I can tell, Enketo does not support this. When instanceID is in the orx:meta block, it seems to generate and populate an instanceID in xforms:meta.

Maybe a bug, it does have the code for it. I'll check!

I believe the fix for Enketo edits was released with EE 3.0.5.

We've just discovered that Collect v2022.3 and prior do not support bind expressions with namespace prefixes when loading from form cache. jr:preload expressions do work so these nodes probably aren't affected but it's something to consider if we ever do want to take this on.

Overall, the change looks too risky with too little payoff to me. Since I opened the original issue, I will close it for now.

A quick extra note. @MartijnR will have to correct me if I'm wrong but I believe that originally ODK and pyxform put the meta block in the default namespace and it was during a consolidation attempt with CommCare that the desire to use the orx namespace came about.

I went to see if/how CommCare uses this namespace and found this test form. It does use a meta block that looks like what was proposed in this issue. However, the expressions that reference the meta nodes don't use the namespace prefix:

<bind nodeset="/data/meta/timeEnd" type="xsd:dateTime"/>
<setvalue event="xforms-ready" ref="/data/meta/username" value="instance('commcaresession')/session/context/username"/>

I would think the value of namespacing this block would be so that someone else could introduce a different meta block without conflict. But since there are XPath path expressions that don't specify a namespace prefix, then I don't think that goal is achieved. In fact, I don't think there's any value to introducing namespaces partially in that way but I could be wrong.

I think you're totally right. Those refs should be /data/orx:meta/orx:username. Maybe this form just tests deprecated syntax though. (cc fyi @ctsims)

Thanks, @MartijnR! I emailed you before seeing this tag, @ctsims, sorry!

Clayton sent me some good historical context on JavaRosa's minimal namespace support. Originally it was a performance concession and because namespacing of nodes in the main instance didn't provide much user-facing value, there wasn't any investment made in better support. I think that's pragmatic and that we should update the ODK XForms spec to match actual usage.

I've realized that allowing users to introduce blocks with the same name as system blocks (e.g. meta) is a whole can of worms. It requires every tool along the analysis pipeline to be namespace-aware. That is, if you end up with a CSV export with two meta columns that are not differentiated, that's going to break various workflows. Or if you have a processing script that looks for meta, it would need to have namespace awareness to deal with the duplicates. I don't think that's practical or matches current usage. Saying that meta is a reserved name regardless of namespacing (defacto usage) is more useful.

Dimagi does use namespaces as a processing directive for downstream tools that are namespace-aware. They can kind of have it both way since they have complete control of the engine -- the client ignores namespaces (so /data/meta/timeEnd is used to bind to /data/orx:meta/timeEnd) but passes them through for other tools to use. We can't do that in the ODK ecosystem because Enketo is XML standards-compliant. And that's fine because we can introduce documented attributes to indicate things to systems that parse forms or submissions instead of using namespaces for that.