Import - handling of odd lastUpdated / versionId combinations
klimkjar opened this issue · 2 comments
Describe the bug
This is probably not something that occurs very often, but we have a few cases in our data (~200 cases in ~10M resource versions) so I assume others might encounter it as well.
We have some processes that can trigger several practically identical PUT attempts simultaneously to the same resource. When this happens, in some cases it results in two or more versions of the same resource where v1 (for instance) has a later or identical lastUpdated-timestamp than v2 (or even v3 in some cases).
When exporting and importing these resources, the import will fail for the affected version IDs. The error messages in the logs are slightly imprecise in these cases. If the lastUpdated timestamps are identical, the error message is 'Failed to import duplicate resource' - however the versionIds differ so they are not exact duplicates. If v1 has a later timestamp than v2 the error message is 'Failed to import resource with version conflict', but the versions do not actually conflict - it is just the lastUpdated timestamps that are inconsistent with the version numbers.
We have a preprocessing step in place before import, so we can fudge the timestamps to make sure all versions are preserved when transferring, and we have an issue in our backlog to make use of transactions / if-match headers to prevent the issue from occuring in the future. I'm still logging this issue here since it does prevent a clean 1-1 migration using export/import for affected datasets.
FHIR Version?
R4
Data provider?
SQL Server
To Reproduce
Steps to reproduce the behavior:
- Have a resource with two versions in the history where the version with the lower version number has a later lastUpdated value than the version with the higher version number.
- Export the resource type containing the affected resource
- Import the result of the export in step 2 to a new server
Expected behavior
All the versions of the resource should be imported to the new server, with identical versionId and lastUpdated fields.
Actual behavior
The versions with lower versionIds and later lastUpdated timestamps will not be imported.
@klimkjar - We recently released a feature enhancement that allows you to insert resources into the FHIR service without specifying the version. The ordering of resources is maintained based on the lastUpdated field value. To use this enhancement, you need to pass the parameter allowNegativeVersions with the value true. For more details, please refer to the import documentation.
Note that ingesting resources with mismatched version IDs and lastUpdated values is not supported by design, as the FHIR service sorts resources based on the lastUpdated field value.
@EXPEkesheth , thank you for your response. Your confirmation that this behaviour is by design supports our plans to tidy up the affected resources and put guards in place to prevent new ones from being created. allowNegativeVersions
is unfortunately not an option for us as we have Provenance records referring to some of the resources, and the targets for these will no longer match if the resource version IDs are changed.
Our planned workaround is to identify all affected resources in our preprocessing stage and reorder the lastUpdated times based on the version numbers, while making sure that every version has a lastUpdated value at least 1ms after the one that came before it. This ensures that the version numbers are kept intact and lastUpdated is still kept close to what it was originally (seeing as all the affected versions were created within a few ms of each other). We're currently testing this approach and results so far are positive.
By the way, we really notice all the love the export / import pipeline has been getting over the past year or so - it is now rock stable and a lot more flexible than it was in the past. Thank you very much for all your efforts on this!