Secondary instance generated for search() configuration likely to conflict with pulldata secondary instance

Question

Secondary instance generated for search() configuration likely to conflict with pulldata secondary instance

Closed this issue 4 months ago · 4 comments

Software and hardware versions

pyxform v2.0.2

Problem description

The fix for #690 led to secondary instances being generated for the choice list used to configure search(). That can lead to an error if the name of that list is the same as the name of the CSV. Unfortunately, that's the pattern documented at https://xlsform.org/en/#dynamic-selects-from-pre-loaded-data so I imagine it's common.

Steps to reproduce the problem

    def test_search_and_pulldata(self):
        md = """
        | survey  |                   |         |             |                    |             |
        |         | type              | name    | label       | appearance         | calculation |
        |         | select_one fruits | fruit   | Question 1  | search('fruits')   |             |
        |         | calculate         | calc    |             |                    | pulldata('fruits', 'this', 'that', ${fruit}) |
        | choices |               |       |        |
        |         | list_name     | name  | label  |
        |         | fruits        | na    | la     |
        """
        self.assertPyxformXform(
            md=md,
            xml__xpath_match=[
                "/h:html/h:body/x:select1/x:item[./x:value/text()='na']"
            ],
        )

See the error:

ERROR: 'The same instance id will be generated for different external instance source URIs. Please check the form. Instance name: 'fruits', Existing type: 'pulldata', Existing URI: 'jr://file-csv/fruits.csv', Duplicate type: 'choice', Duplicate URI: 'None', Duplicate context: 'survey'.'

Expected behavior

No error.

Other information

Could we do something hacky like add a uuid to the secondary instance name if it's hard not to produce it? I don't think it's used for anything.

Answer 1 · 2024-02-08T00:25:42.000Z

Well, I had my test runs configured wrong so I got convinced we could just not ever have an itemset defined: #695 That breaks translations, though, so is no-go.

@lindsay-stevens any chance you have quick ideas? Sorry we're still running into issues around this!

Answer 2 · 2024-02-08T08:11:43.000Z

The below test which omits search results in the same error. The core problem is that the user has a choice list with the same name as the implied pulldata file (or any other instance's id really: xml-external, select_from_file, last-saved, etc). It would be confusing to allow an instance name conflict just because the choice appearance is search. While the docs use the same name "fruits" for the search and pulldata examples, they aren't shown in use together so it's not an encouraged use case (I read them as being alternatives).

There is already a fair bit of special handling for search and likely the solution here would complicate things further, e.g. "if the choice list is only used by selects with a search appearance then generate a unique instance id (because other selects using those choices might have XPaths referring to the choice instance)" - which means XPaths would be difficult (if enumerated) or not possible (if random) in that scenario due to a modified instance id. Not to mention the possibility of regression with other things like translations, other instance sources, etc.

So my suggested fix would be to add some tests to document this behaviour, and maybe update the examples to use unique names (other than "fruits").

    def test_choice_and_pulldata(self):
        md = """
            | survey  |               |      |            |             |
            |         | type          | name | label      | calculation |
            |         | select_one c1 | q1   | Question 1 |             |
            |         | calculate     | calc |            | pulldata('c1', 'this', 'that', ${q1}) |
            | choices |           |      |       |
            |         | list_name | name | label |
            |         | c1        | na   | la    |
            """
        self.assertPyxformXform(
            md=md,
            xml__xpath_match=[
                "/h:html/h:body/x:select1/x:item[./x:value/text()='na']"
            ],
        )

Answer 3 · 2024-02-08T17:27:11.000Z

I don’t disagree with anything you wrote! I do find it really hard to break existing forms, though. The user who reported this was extremely confused and wasted quite a bit of time trying to find other solutions. Selecting something from a list with search() and then getting one of its properties with pulldata() is a common pattern when using those two.

How did this work before pyxform 2? There was no itemset set for selects with search(). Somehow translations were still generated, though? And a secondary instance was created, it just bypassed the check for duplicates somehow?

I’m ok for waiting to see how often it comes up before we decide what to do. My guess is that it will get reported a lot once some downstream tools that recommend search()/pulldata() upgrade.

Answer 4 · 2024-02-09T10:48:48.000Z

Prior to pyxform 2.0:

search only: body select only
search + pulldata: body select + secondary instance
search + choice_filter: secondary instance only (so it didn't work)
search + choice_filter + pulldata: didn't work (forum + issue 387 which motivated 603 which leads to here)
translations with body select: referenced itext via XPath that includes the choice name (part of reason for 603)
translations with secondary instance: referenced itext via enumerated choice identifier

Maybe search could work if it's secondary instance and those of other compatible external/secondary instance types were merged by pyxform. For external CSV they are basically just a container for id and src: <instance id="fruits" src="jr://file-csv/fruits.csv"/> (the column parameters for from_file go in the body itemset).

In other words, use the secondary instance that would be generated for the search, and copy over the src URI for the CSV file, and delete/don't emit the duplicate instance. Like this (for a search with translations and a pulldata of the same name):

      <instance id="fruits" src="jr://file-csv/fruits.csv">
        <root>
          <item>
            <itextId>fruits-0</itextId>
            <name>na</name>
          </item>
        </root>
      </instance>

In the below table I'm guessing how and when this change could work in combination with other sources of secondary instances, specifically when the instance name referenced by search is the same instance name of an existing instance. Which of these scenarios are supported by Enketo or Collect?

	`search` on same question works?	`search` on a different question works?
search	n/a	yes, share the search instance (no src)
pulldata	yes, share the search instance, but copy over the src CSV URI (e.g. for pulldata in a constraint)	yes, share the search instance, but copy over the src CSV URI
from_file csv	yes, but the choices sheet data (if any) is ignored (e.g. additional static choices or translations) because `parameters` say which columns to read	yes, share the search instance, but copy over the src CSV URI
csv-external	no, not a select	yes, share the search instance, but copy over the src CSV URI
from_file xml	no, wrong data type (raise duplicate instance error)	no, wrong data type (raise duplicate instance error)
from_file geojson	no, wrong data type (raise duplicate instance error)	no, wrong data type (raise duplicate instance error)
xml-external	no, not a select	no, wrong data type (raise duplicate instance error)

This could be a way to get CSV external instances to support translations, because search can specify language-dependent label columns. But it could also be a headache later if there's any chance that these merged instances need to have some other non-attribute content in the instance (in which case it'd be messy to merge).