Export of encrypted submissions using Briefcase UI fails
DavisRayM opened this issue · 6 comments
Software versions
Briefcase v1.18.0, Java v1.8.0_292
Problem description
Exporting encrypted submissions fail due to duplicate xmlns
tag in the stored ODK Briefcase pulled submission (error attached below). The extra xmlns
tag is not present when retrieved from https://stage-api.ona.io; seems it's added during the download process
2021-05-06 17:02:48,779 [ForkJoinPool-5-worker-1] ERROR o.o.b.export.SubmissionParser - Parse error attempting to read instance date
javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,286]
Message: Attribute "xmlns" was already specified for element "n0:data".
Steps to reproduce the problem
- Create an encrypted form on an ODK Aggregate server of your choice. I used https://stage-api.ona.io in this case
- Make a few submissions to the form
- Try to pull and export form data
Expected behavior
When submissions are pulled the additional xmlns attribute should not be added to the downloaded submission and the export should complete sucessfully. The extra xmlns
seems to be the one from the submission
node...
Other information
- Removing one of the
xmlns
attributes successfully exports the submission
Briefcase submission XML:
<n0:data encrypted="yes" id="a97HQYbA5ufGxRYY3H4vE5" instanceID="uuid:2cab3370-d15c-4f4d-a3bb-34f1f6a80706" submissionDate="2021-04-27T09:08:09.745247+00:00" version="vWZnSVN3reoFBd3EwA3qna"
xmlns="http://www.opendatakit.org/xforms/encrypted"
xmlns="http://opendatakit.org/submissions"
xmlns:n0="http://www.opendatakit.org/xforms/encrypted">
<n0:base64EncryptedKey>s+skAk7Ie2X5/+ol/orEvIwNwQJKT0Zxdb3HDf/+OtprgGMmu4c3yU5MeGrpko1G38i8v8nxb7OryADkdL9UG0iJUqDX3lroZRGaKXb74P+IASePKFYfgT68uBhnUpGxbVTYjh2bRgCBUIb+RLQFBo3QvdK/VB1ukE9c4LZNNbS8dk7dv7450koMxLLSliemRzW15BXWvDZCdjWN6WgnLLsd7Y9jZHXfDii35Bg5L1s0UGFy6CU/m3N0Irg9teW2CHn+gEfPlvKZLPs5oDXqmiD+ABeW8aRPkFYRGA0WcRlbscS0TJAUliDUCw5rhDNWgYps11t/yEVwG9Gox8MJiA==</n0:base64EncryptedKey>
<orx:meta xmlns:orx="http://openrosa.org/xforms">
<orx:instanceID>uuid:2cab3370-d15c-4f4d-a3bb-34f1f6a80706</orx:instanceID>
</orx:meta>
<n0:encryptedXmlFile>submission.xml.enc</n0:encryptedXmlFile>
<n0:base64EncryptedElementSignature>BUMJ3QAx4wlPtQaNXmOR5khpg9k5PMrugCD3aHawgok0xflSBoPdFqQtw8n5khhrHDfAjqQimCmkbDJDHfsHOQB86VAvmXh6zt7q0JklK//VyDsV+ghtZvHaAxLLHImsuwYRPSB9ZOirJaWGW4BEODdW9/gujgC9yJCgXw94b9asn/Q4I+ZhvDn+tIm8hhOrxWb7u3NptklusnzRY6OclthO0yFJnlXo34dMgVwTrMrs0rmbhVQpMiamHsa9ClYDchtFe5FZBklOrIFLKpnH9Ay/HeCsWYZkSeeJAJgOYGqNs1CFUwvTeukW0UiZ/LQgGdEsF1HNVY73jAD0v2lsVw==</n0:base64EncryptedElementSignature>
</n0:data>
Server XML:
<submission xmlns="<a href="http://opendatakit.org/submissions" rel="nofollow">http://opendatakit.org/submissions</a>"
xmlns:orx="<a href="http://openrosa.org/xforms" rel="nofollow">http://openrosa.org/xforms</a>">
<data>
<data encrypted="yes" id="a97HQYbA5ufGxRYY3H4vE5" instanceID="uuid:2cab3370-d15c-4f4d-a3bb-34f1f6a80706" submissionDate="2021-04-27T09:08:09.745247+00:00" version="vWZnSVN3reoFBd3EwA3qna"
xmlns="<a href="http://www.opendatakit.org/xforms/encrypted" rel="nofollow">http://www.opendatakit.org/xforms/encrypted</a>">
<base64EncryptedKey>s+skAk7Ie2X5/+ol/orEvIwNwQJKT0Zxdb3HDf/+OtprgGMmu4c3yU5MeGrpko1G38i8v8nxb7OryADkdL9UG0iJUqDX3lroZRGaKXb74P+IASePKFYfgT68uBhnUpGxbVTYjh2bRgCBUIb+RLQFBo3QvdK/VB1ukE9c4LZNNbS8dk7dv7450koMxLLSliemRzW15BXWvDZCdjWN6WgnLLsd7Y9jZHXfDii35Bg5L1s0UGFy6CU/m3N0Irg9teW2CHn+gEfPlvKZLPs5oDXqmiD+ABeW8aRPkFYRGA0WcRlbscS0TJAUliDUCw5rhDNWgYps11t/yEVwG9Gox8MJiA==</base64EncryptedKey>
<orx:meta xmlns:orx="<a href="http://openrosa.org/xforms" rel="nofollow">http://openrosa.org/xforms</a>">
<orx:instanceID>uuid:2cab3370-d15c-4f4d-a3bb-34f1f6a80706</orx:instanceID>
</orx:meta>
<encryptedXmlFile>submission.xml.enc</encryptedXmlFile>
<base64EncryptedElementSignature>BUMJ3QAx4wlPtQaNXmOR5khpg9k5PMrugCD3aHawgok0xflSBoPdFqQtw8n5khhrHDfAjqQimCmkbDJDHfsHOQB86VAvmXh6zt7q0JklK//VyDsV+ghtZvHaAxLLHImsuwYRPSB9ZOirJaWGW4BEODdW9/gujgC9yJCgXw94b9asn/Q4I+ZhvDn+tIm8hhOrxWb7u3NptklusnzRY6OclthO0yFJnlXo34dMgVwTrMrs0rmbhVQpMiamHsa9ClYDchtFe5FZBklOrIFLKpnH9Ay/HeCsWYZkSeeJAJgOYGqNs1CFUwvTeukW0UiZ/LQgGdEsF1HNVY73jAD0v2lsVw==</base64EncryptedElementSignature>
</data>
</data>
<mediaFile>
<filename>submission.xml.enc</filename>
<hash>md5:b111679e619189d6398acb50e5bac43c</hash>
<downloadUrl>
<a href="https://stage-api.ona.io/attachment/original?media_file=25032000%2Fattachments%2F1415_a97HQYbA5ufGxRYY3H4vE5%2Fsubmission.xml_cs1q0tX.enc" rel="nofollow">https://stage-api.ona.io/attachment/original?media_file=25032000/attachments/1415_a97HQYbA5ufGxRYY3H4vE5/submission.xml_cs1q0tX.enc</a>
</downloadUrl>
</mediaFile>
</submission>
I believe all your namespace declarations are invalid. What if you make them raw URIs as they should be?
I believe that might have been an issue caused by me retrieving the XML from the view-source
page on my browser; Tried pulling the XML via python requests. Seems pretty okay; Also confirmed that the template we return uses the raw URIs instead of keeping them in HTML <a>
tags. The error is still on-going with this type of response
>>> resp.content
'<?xml version=\'1.0\' encoding=\'UTF-8\' ?>\n<submission xmlns="http://opendatakit.org/submissions" xmlns:orx="http://openrosa.org/xforms">\n <data>\n <data encrypted="yes" id="a97HQYbA5ufGxRYY3H4vE5" instanceID="uuid:2cab3370-d15c-4f4d-a3bb-34f1f6a80706" submissionDate="2021-04-27T09:08:09.745247+00:00" version="vWZnSVN3reoFBd3EwA3qna" xmlns="http://www.opendatakit.org/xforms/encrypted"><base64EncryptedKey>s+skAk7Ie2X5/+ol/orEvIwNwQJKT0Zxdb3HDf/+OtprgGMmu4c3yU5MeGrpko1G38i8v8nxb7OryADkdL9UG0iJUqDX3lroZRGaKXb74P+IASePKFYfgT68uBhnUpGxbVTYjh2bRgCBUIb+RLQFBo3QvdK/VB1ukE9c4LZNNbS8dk7dv7450koMxLLSliemRzW15BXWvDZCdjWN6WgnLLsd7Y9jZHXfDii35Bg5L1s0UGFy6CU/m3N0Irg9teW2CHn+gEfPlvKZLPs5oDXqmiD+ABeW8aRPkFYRGA0WcRlbscS0TJAUliDUCw5rhDNWgYps11t/yEVwG9Gox8MJiA==</base64EncryptedKey><orx:meta xmlns:orx="http://openrosa.org/xforms"><orx:instanceID>uuid:2cab3370-d15c-4f4d-a3bb-34f1f6a80706</orx:instanceID></orx:meta><encryptedXmlFile>submission.xml.enc</encryptedXmlFile><base64EncryptedElementSignature>BUMJ3QAx4wlPtQaNXmOR5khpg9k5PMrugCD3aHawgok0xflSBoPdFqQtw8n5khhrHDfAjqQimCmkbDJDHfsHOQB86VAvmXh6zt7q0JklK//VyDsV+ghtZvHaAxLLHImsuwYRPSB9ZOirJaWGW4BEODdW9/gujgC9yJCgXw94b9asn/Q4I+ZhvDn+tIm8hhOrxWb7u3NptklusnzRY6OclthO0yFJnlXo34dMgVwTrMrs0rmbhVQpMiamHsa9ClYDchtFe5FZBklOrIFLKpnH9Ay/HeCsWYZkSeeJAJgOYGqNs1CFUwvTeukW0UiZ/LQgGdEsF1HNVY73jAD0v2lsVw==</base64EncryptedElementSignature></data>\n </data>\n <mediaFile>\n <filename>submission.xml.enc</filename>\n <hash>md5:b111679e619189d6398acb50e5bac43c</hash>\n <downloadUrl>https://stage-api.ona.io/attachment/original?media_file=25032000/attachments/1415_a97HQYbA5ufGxRYY3H4vE5/submission.xml_cs1q0tX.enc</downloadUrl>\n </mediaFile>\n</submission>\n'
Might have misunderstood... but the namespace declarations looks okay in this format.
Edit: Confirmed just in case seems the Python requests output is what is also returned on briefcase my bad for attaching the wrong example; hopefully the above one contains valid namespaces ?
CC: @lognaturel
Great! Next step will be to look at the XML either raw from Collect or from a server that does work and compare that (Central, Aggregate, KoBo). You can also review the spec at https://getodk.github.io/xforms-spec/encryption You can also write the XML you have from Ona locally and modify it until it works. Like what if you remove the extra nested data
block?
extra xmlns seems to be the one from the submission node
That's quite possible too. What if you remove that? The namespace is indeed supposed to be on the data
block.
Quick update on this issue:
We've tested it out on both ona.io and kc.kobotoolbox.org; the issue is replicatable on both instances as of July 6th. Still haven't test with ODK Central. Will update this comment as soon as that's done
I still suspect there's something not quite right about the structure of submissions being pulled. Do Formhub-derived systems recreate the submissions or wrap them in some way? If you think this is a general issue, it would be good to get a reproduction from submissions that don't come from a Formhub-derived system. Like I said, this could be done by directly taking a raw encrypted submission from Collect and trying to identify a case that fails or comparing the submission XML with what you pull out of your system. We verify pulling and exporting encrypted submissions from Central and Aggregate in regression testing.
If you believe that this worked for Formhub-based systems previously, you could try doing a bisect to try to identify what might have changed.
Formhub derived systems wrap the submissions into a response document identical to the one defined here for ODK Aggregate(https://docs.getodk.org/briefcase-api/#response-document).
This seems to have worked before in releases that were before this PR; It was technically a bug that became a feature in a way... The XML did have multiple namespaces but it exported even though briefcase wasn't able to read the submission date. A way I've been able to export the data with the current changes is by:
- Removing this line in briefcase so that the duplicate
xmlns
attributes aren't there - Reconstructing the XML received from ODK Collect and removing the namespace before returning the response on the
/downloadSubmission
endpoint
The reconstructed submission XML generally does not respect the namespaces of the original form definition. As a special case, if it finds a form group that could be interpreted as the OpenRosa Metadata block, it does use the orx namespace for that.
From the above, it seems that ODK Aggregate might have removed the namespaces from the XML received from ODK collect...? If so this issue should be closed; Seems like it's more of an issue with the Formhub derived systems