getodk/briefcase

Export of encrypted submissions using Briefcase UI fails

DavisRayM opened this issue · 6 comments

Software versions

Briefcase v1.18.0, Java v1.8.0_292

Problem description

Exporting encrypted submissions fail due to duplicate xmlns tag in the stored ODK Briefcase pulled submission (error attached below). The extra xmlns tag is not present when retrieved from https://stage-api.ona.io; seems it's added during the download process

2021-05-06 17:02:48,779 [ForkJoinPool-5-worker-1] ERROR o.o.b.export.SubmissionParser - Parse error attempting to read instance date
javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,286]
Message: Attribute "xmlns" was already specified for element "n0:data".

Steps to reproduce the problem

  1. Create an encrypted form on an ODK Aggregate server of your choice. I used https://stage-api.ona.io in this case
  2. Make a few submissions to the form
  3. Try to pull and export form data

Expected behavior

When submissions are pulled the additional xmlns attribute should not be added to the downloaded submission and the export should complete sucessfully. The extra xmlns seems to be the one from the submission node...

Other information

  • Removing one of the xmlns attributes successfully exports the submission

Briefcase submission XML:

<n0:data encrypted="yes" id="a97HQYbA5ufGxRYY3H4vE5" instanceID="uuid:2cab3370-d15c-4f4d-a3bb-34f1f6a80706" submissionDate="2021-04-27T09:08:09.745247+00:00" version="vWZnSVN3reoFBd3EwA3qna"
    xmlns="http://www.opendatakit.org/xforms/encrypted"
    xmlns="http://opendatakit.org/submissions"
    xmlns:n0="http://www.opendatakit.org/xforms/encrypted">
    <n0:base64EncryptedKey>s+skAk7Ie2X5/+ol/orEvIwNwQJKT0Zxdb3HDf/+OtprgGMmu4c3yU5MeGrpko1G38i8v8nxb7OryADkdL9UG0iJUqDX3lroZRGaKXb74P+IASePKFYfgT68uBhnUpGxbVTYjh2bRgCBUIb+RLQFBo3QvdK/VB1ukE9c4LZNNbS8dk7dv7450koMxLLSliemRzW15BXWvDZCdjWN6WgnLLsd7Y9jZHXfDii35Bg5L1s0UGFy6CU/m3N0Irg9teW2CHn+gEfPlvKZLPs5oDXqmiD+ABeW8aRPkFYRGA0WcRlbscS0TJAUliDUCw5rhDNWgYps11t/yEVwG9Gox8MJiA==</n0:base64EncryptedKey>
    <orx:meta xmlns:orx="http://openrosa.org/xforms">
        <orx:instanceID>uuid:2cab3370-d15c-4f4d-a3bb-34f1f6a80706</orx:instanceID>
    </orx:meta>
    <n0:encryptedXmlFile>submission.xml.enc</n0:encryptedXmlFile>
    <n0:base64EncryptedElementSignature>BUMJ3QAx4wlPtQaNXmOR5khpg9k5PMrugCD3aHawgok0xflSBoPdFqQtw8n5khhrHDfAjqQimCmkbDJDHfsHOQB86VAvmXh6zt7q0JklK//VyDsV+ghtZvHaAxLLHImsuwYRPSB9ZOirJaWGW4BEODdW9/gujgC9yJCgXw94b9asn/Q4I+ZhvDn+tIm8hhOrxWb7u3NptklusnzRY6OclthO0yFJnlXo34dMgVwTrMrs0rmbhVQpMiamHsa9ClYDchtFe5FZBklOrIFLKpnH9Ay/HeCsWYZkSeeJAJgOYGqNs1CFUwvTeukW0UiZ/LQgGdEsF1HNVY73jAD0v2lsVw==</n0:base64EncryptedElementSignature>
</n0:data>

Server XML:

<submission xmlns="<a href="http://opendatakit.org/submissions" rel="nofollow">http://opendatakit.org/submissions</a>"
    xmlns:orx="<a href="http://openrosa.org/xforms" rel="nofollow">http://openrosa.org/xforms</a>">
    <data>
        <data encrypted="yes" id="a97HQYbA5ufGxRYY3H4vE5" instanceID="uuid:2cab3370-d15c-4f4d-a3bb-34f1f6a80706" submissionDate="2021-04-27T09:08:09.745247+00:00" version="vWZnSVN3reoFBd3EwA3qna"
            xmlns="<a href="http://www.opendatakit.org/xforms/encrypted" rel="nofollow">http://www.opendatakit.org/xforms/encrypted</a>">
            <base64EncryptedKey>s+skAk7Ie2X5/+ol/orEvIwNwQJKT0Zxdb3HDf/+OtprgGMmu4c3yU5MeGrpko1G38i8v8nxb7OryADkdL9UG0iJUqDX3lroZRGaKXb74P+IASePKFYfgT68uBhnUpGxbVTYjh2bRgCBUIb+RLQFBo3QvdK/VB1ukE9c4LZNNbS8dk7dv7450koMxLLSliemRzW15BXWvDZCdjWN6WgnLLsd7Y9jZHXfDii35Bg5L1s0UGFy6CU/m3N0Irg9teW2CHn+gEfPlvKZLPs5oDXqmiD+ABeW8aRPkFYRGA0WcRlbscS0TJAUliDUCw5rhDNWgYps11t/yEVwG9Gox8MJiA==</base64EncryptedKey>
            <orx:meta xmlns:orx="<a href="http://openrosa.org/xforms" rel="nofollow">http://openrosa.org/xforms</a>">
                <orx:instanceID>uuid:2cab3370-d15c-4f4d-a3bb-34f1f6a80706</orx:instanceID>
            </orx:meta>
            <encryptedXmlFile>submission.xml.enc</encryptedXmlFile>
            <base64EncryptedElementSignature>BUMJ3QAx4wlPtQaNXmOR5khpg9k5PMrugCD3aHawgok0xflSBoPdFqQtw8n5khhrHDfAjqQimCmkbDJDHfsHOQB86VAvmXh6zt7q0JklK//VyDsV+ghtZvHaAxLLHImsuwYRPSB9ZOirJaWGW4BEODdW9/gujgC9yJCgXw94b9asn/Q4I+ZhvDn+tIm8hhOrxWb7u3NptklusnzRY6OclthO0yFJnlXo34dMgVwTrMrs0rmbhVQpMiamHsa9ClYDchtFe5FZBklOrIFLKpnH9Ay/HeCsWYZkSeeJAJgOYGqNs1CFUwvTeukW0UiZ/LQgGdEsF1HNVY73jAD0v2lsVw==</base64EncryptedElementSignature>
        </data>
    </data>
    <mediaFile>
        <filename>submission.xml.enc</filename>
        <hash>md5:b111679e619189d6398acb50e5bac43c</hash>
        <downloadUrl>
            <a href="https://stage-api.ona.io/attachment/original?media_file=25032000%2Fattachments%2F1415_a97HQYbA5ufGxRYY3H4vE5%2Fsubmission.xml_cs1q0tX.enc" rel="nofollow">https://stage-api.ona.io/attachment/original?media_file=25032000/attachments/1415_a97HQYbA5ufGxRYY3H4vE5/submission.xml_cs1q0tX.enc</a>
        </downloadUrl>
    </mediaFile>
</submission>

I believe all your namespace declarations are invalid. What if you make them raw URIs as they should be?

I believe that might have been an issue caused by me retrieving the XML from the view-source page on my browser; Tried pulling the XML via python requests. Seems pretty okay; Also confirmed that the template we return uses the raw URIs instead of keeping them in HTML <a> tags. The error is still on-going with this type of response

>>> resp.content
'<?xml version=\'1.0\' encoding=\'UTF-8\' ?>\n<submission xmlns="http://opendatakit.org/submissions" xmlns:orx="http://openrosa.org/xforms">\n    <data>\n        <data encrypted="yes" id="a97HQYbA5ufGxRYY3H4vE5" instanceID="uuid:2cab3370-d15c-4f4d-a3bb-34f1f6a80706" submissionDate="2021-04-27T09:08:09.745247+00:00" version="vWZnSVN3reoFBd3EwA3qna" xmlns="http://www.opendatakit.org/xforms/encrypted"><base64EncryptedKey>s+skAk7Ie2X5/+ol/orEvIwNwQJKT0Zxdb3HDf/+OtprgGMmu4c3yU5MeGrpko1G38i8v8nxb7OryADkdL9UG0iJUqDX3lroZRGaKXb74P+IASePKFYfgT68uBhnUpGxbVTYjh2bRgCBUIb+RLQFBo3QvdK/VB1ukE9c4LZNNbS8dk7dv7450koMxLLSliemRzW15BXWvDZCdjWN6WgnLLsd7Y9jZHXfDii35Bg5L1s0UGFy6CU/m3N0Irg9teW2CHn+gEfPlvKZLPs5oDXqmiD+ABeW8aRPkFYRGA0WcRlbscS0TJAUliDUCw5rhDNWgYps11t/yEVwG9Gox8MJiA==</base64EncryptedKey><orx:meta xmlns:orx="http://openrosa.org/xforms"><orx:instanceID>uuid:2cab3370-d15c-4f4d-a3bb-34f1f6a80706</orx:instanceID></orx:meta><encryptedXmlFile>submission.xml.enc</encryptedXmlFile><base64EncryptedElementSignature>BUMJ3QAx4wlPtQaNXmOR5khpg9k5PMrugCD3aHawgok0xflSBoPdFqQtw8n5khhrHDfAjqQimCmkbDJDHfsHOQB86VAvmXh6zt7q0JklK//VyDsV+ghtZvHaAxLLHImsuwYRPSB9ZOirJaWGW4BEODdW9/gujgC9yJCgXw94b9asn/Q4I+ZhvDn+tIm8hhOrxWb7u3NptklusnzRY6OclthO0yFJnlXo34dMgVwTrMrs0rmbhVQpMiamHsa9ClYDchtFe5FZBklOrIFLKpnH9Ay/HeCsWYZkSeeJAJgOYGqNs1CFUwvTeukW0UiZ/LQgGdEsF1HNVY73jAD0v2lsVw==</base64EncryptedElementSignature></data>\n    </data>\n    <mediaFile>\n        <filename>submission.xml.enc</filename>\n        <hash>md5:b111679e619189d6398acb50e5bac43c</hash>\n        <downloadUrl>https://stage-api.ona.io/attachment/original?media_file=25032000/attachments/1415_a97HQYbA5ufGxRYY3H4vE5/submission.xml_cs1q0tX.enc</downloadUrl>\n    </mediaFile>\n</submission>\n'

Might have misunderstood... but the namespace declarations looks okay in this format.

Edit: Confirmed just in case seems the Python requests output is what is also returned on briefcase my bad for attaching the wrong example; hopefully the above one contains valid namespaces ?

image

CC: @lognaturel

Great! Next step will be to look at the XML either raw from Collect or from a server that does work and compare that (Central, Aggregate, KoBo). You can also review the spec at https://getodk.github.io/xforms-spec/encryption You can also write the XML you have from Ona locally and modify it until it works. Like what if you remove the extra nested data block?

extra xmlns seems to be the one from the submission node

That's quite possible too. What if you remove that? The namespace is indeed supposed to be on the data block.

Quick update on this issue:

We've tested it out on both ona.io and kc.kobotoolbox.org; the issue is replicatable on both instances as of July 6th. Still haven't test with ODK Central. Will update this comment as soon as that's done

I still suspect there's something not quite right about the structure of submissions being pulled. Do Formhub-derived systems recreate the submissions or wrap them in some way? If you think this is a general issue, it would be good to get a reproduction from submissions that don't come from a Formhub-derived system. Like I said, this could be done by directly taking a raw encrypted submission from Collect and trying to identify a case that fails or comparing the submission XML with what you pull out of your system. We verify pulling and exporting encrypted submissions from Central and Aggregate in regression testing.

If you believe that this worked for Formhub-based systems previously, you could try doing a bisect to try to identify what might have changed.

Formhub derived systems wrap the submissions into a response document identical to the one defined here for ODK Aggregate(https://docs.getodk.org/briefcase-api/#response-document).

This seems to have worked before in releases that were before this PR; It was technically a bug that became a feature in a way... The XML did have multiple namespaces but it exported even though briefcase wasn't able to read the submission date. A way I've been able to export the data with the current changes is by:

  1. Removing this line in briefcase so that the duplicate xmlns attributes aren't there
  2. Reconstructing the XML received from ODK Collect and removing the namespace before returning the response on the /downloadSubmission endpoint
The reconstructed submission XML generally does not respect the namespaces of the original form definition. As a special case, if it finds a form group that could be interpreted as the OpenRosa Metadata block, it does use the orx namespace for that.

From the above, it seems that ODK Aggregate might have removed the namespaces from the XML received from ODK collect...? If so this issue should be closed; Seems like it's more of an issue with the Formhub derived systems