IHTSDO/snowstorm

Failed to import UK Drug extension (SOLVED)

ciprianaradulescu opened this issue ยท 21 comments

Hello,

I've loaded the snomed international edition SnomedCT_InternationalRF2_PRODUCTION_20230131T120000Z as per the process documented here https://github.com/IHTSDO/snowstorm/blob/master/docs/loading-snomed.md. This resulted in the following 2 branches: MAIN and MAIN/2023-01-31, both looking ok.

After the international import I'm trying to import the UK Drug extension SnomedCT_UKDrugRF2_PRODUCTION_20230510T000001Z as per the process documented here https://github.com/IHTSDO/snowstorm/blob/master/docs/updating-snomed-and-extensions.md. I've tried various combinations of parameters, either by specifying / ommiting the dependant version or by setting the createCodeSystemVersion both true and false, but everytime i get this error:

snowstorm | 2023-05-30 13:08:15.320 ERROR 1 --- [ool-11-thread-9] o.ihtsdo.otf.snomedboot.ReleaseImporter : Failed to read or process lines. snowstorm | snowstorm | java.lang.NullPointerException: null snowstorm | at org.snomed.snowstorm.core.rf2.rf2import.ImportComponentFactoryImpl.lambda$processEntities$1(ImportComponentFactoryImpl.java:131) snowstorm | at java.base/java.util.ArrayList.forEach(ArrayList.java:1541) snowstorm | at org.snomed.snowstorm.core.rf2.rf2import.ImportComponentFactoryImpl.processEntities(ImportComponentFactoryImpl.java:130) snowstorm | at org.snomed.snowstorm.core.rf2.rf2import.ImportComponentFactoryImpl$4.persistCollection(ImportComponentFactoryImpl.java:113) snowstorm | at org.snomed.snowstorm.core.rf2.rf2import.ImportComponentFactoryImpl$PersistBuffer.flush(ImportComponentFactoryImpl.java:332) snowstorm | at org.snomed.snowstorm.core.rf2.rf2import.ImportComponentFactoryImpl$PersistBuffer.save(ImportComponentFactoryImpl.java:327) snowstorm | at org.snomed.snowstorm.core.rf2.rf2import.ImportComponentFactoryImpl.newReferenceSetMemberState(ImportComponentFactoryImpl.java:293) snowstorm | at org.ihtsdo.otf.snomedboot.ReleaseImporter$ImportRun.lambda$loadRefsets$4(ReleaseImporter.java:496) snowstorm | at org.ihtsdo.otf.snomedboot.ReleaseImporter$ImportRun.readLines(ReleaseImporter.java:603) snowstorm | at org.ihtsdo.otf.snomedboot.ReleaseImporter$ImportRun.lambda$readLinesCallable$5(ReleaseImporter.java:514) snowstorm | at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) snowstorm | at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) snowstorm | at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) snowstorm | at java.base/java.lang.Thread.run(Thread.java:829).

Could anyone please provide some information as to what is causing the error?

FYI: I'm running the docker compose environment from here https://github.com/IHTSDO/snowstorm/blob/master/docs/using-docker.md, but with the ES memory increased to 24g because with the default setting it ran out of memory.

Thank you,
Ciprian

Hi Ciprian, I'm sorry to hear that this is not working as expected.
Could you confirm what version of Snowstorm and Elasticsearch you are running please?

Sure, and sorry, i forgot to mention them ๐Ÿคฆ ES: 7.10.2, Snowstorm: latest (not sure if this means 8.1.0 or something else)

That's perfect thanks. I will try to reproduce.

Although I'm from the UK I am not familiar with the UK SNOMED packages, could you help me please?
I've downloaded the file uk_sct2dr_36.1.0_20230510000001Z.zip from Trud. That contains snapshot files under SnomedCT_UKEditionRF2_PRODUCTION_20230510T000001Z and also snapshot files under SnomedCT_UKDrugRF2_PRODUCTION_20230510T000001Z.

To reproduce the issue should I upload the whole of the zip or create a new zip using one of those sub directories?

So far, I've only tried with importing SnomedCT_UKDrugRF2_PRODUCTION_20230510T000001Z directly over the International Edition. Honestly, I'm not sure if the UK Edition itself needs to be imported before the Drug Extension.

No worries, I will just try the same.

I think I have reproduced the issue.
I received several warnings in the log like this:

2023-05-31 10:47:39.417  WARN 73749 --- [nPool-worker-11] o.s.s.c.d.s.ReferenceSetMemberService    : Refset member refers to description which does not exist, this will not be persisted 002356c4-6aef-525d-ae05-a62e9606aadd -> 46691701000001119

This hints at another package needing to be imported first. I think that is the cause of the null pointer, this is a bug and related to multithreading when processing refset members.

After the null pointer Snowstorm attempts to rollback the commit but for me this failed with a timeout error:

org.springframework.dao.DataAccessResourceFailureException: 30,000 milliseconds timeout on connection http-outgoing-1 [ACTIVE]; nested exception is java.lang.RuntimeException: 30,000 milliseconds timeout on connection http-outgoing-1 [ACTIVE]

If you also got this error you must run the admin function to complete the commit rollback on whatever branch the import ran on, it's POST /admin/{branch}/actions/rollback-partial-commit in swagger.

After that we should be able to import the UK edition SnomedCT_UKEditionRF2_PRODUCTION_20230510T000001Z and then the drugs extension on top. You could create a nested codesystem structure for this like:

  • SNOMEDCT (International Edition, MAIN branch)
    • SNOMEDCT-UK (UK Edition, MAIN/SNOMEDCT-UK branch)
      • SNOMEDCT-UKDRUG (UK Drug Extension, MAIN/SNOMEDCT-UK/SNOMEDCT-UKDRUG branch)

When the time comes to upgrade this would allow you to upgrade the International edition, UK edition and drug extension as separate code systems. We recommend this approach when using these extension style packages.

This is my plan. I will test this.

I am getting further warnings about refset members referring to descriptions that do not exist when importing SnomedCT_UKEditionRF2_PRODUCTION_20230510T000001Z. I've found the missing descriptions in the UK termbrowser.. they are part of the 999000011000000103 |SNOMED CT United Kingdom clinical extension module|.. descriptions in that module do not appear to be part of this package.

I think we need some advice from the NHS about what order their packages should be imported in. I have heard about a monolith package which may be a workaround for these dependency problems.

Downloading the SNOMED CT UK Monolith Edition, RF2: Snapshot to try that.. it includes the International Edition and various UK extensions. That can be imported into a blank Snowstorm onto the MAIN branch, no separate code systems are necessary.
Trying...

I have just been warned by a college that the UK Monolith package could take many hours to import the snapshot. I will set that to run at the end of the day so I can keep developing on this machine during the day.
Will post an update in the morning.

Thanks a lot for your help. Truth be told, I'm really confused about what needs to be imported, and in what order, in order to get the UK drug database up and running. I didn't know about the monolith package. That sounds like it might solve all of our issues. Looking forward to hearing about how the import went.

I was able to import the SNOMED CT UK Edition Monolith package directly onto the MAIN branch by starting with a blank Elasticsearch and starting Snowstorm with the following options:

java -Xms8g -Xmx8g -jar snowstorm-8.1.0.jar --elasticsearch.index.max.terms.count=1000000 --import=../../release/uk_sct2mo_36.1.0_20230510000001Z.zip

The max.terms.count setting is required with the UK edition to prevent exceptions when creating the ECL index.
The import took 90 minutes on a Macbook Pro M1.

Awsome ! I'll give that a try and close the issue if it imports successfully. Thank you very much for your help !

The import finished successfully after about 1 hour or so. However, the MAIN branch is now locked. Is this expected?

[ { "path": "MAIN", "containsContent": true, "locked": true, "creation": "2023-06-02T08:26:16.486Z", "base": "2023-06-02T08:26:16.486Z", "head": "2023-06-02T08:26:16.767Z", "creationTimestamp": 1685694376486, "baseTimestamp": 1685694376486, "headTimestamp": 1685694376767, "versionsReplacedCounts": { "ReferenceSetType": 0 }, "deleted": false } ]

Also, I'm not getting any search results when querying for basic drug concepts such as nurofen and paracetamol.

The import will go very quiet after importing all the refset members for a long time. This is calculating the index for ECL queries. You should then see lots of QueryConcepts being saved. This happens twice because Snowstorm allows ECL on the stated form (axioms for authoring) and the inferred form (for EHRs and implementers).
The import may not have completed yet.
Check for a line in the log file like this:

2023-05-31 19:37:48.845  INFO 3292 --- [           main] o.s.s.core.rf2.rf2import.ImportService   : Completed RF2 SNAPSHOT import on branch MAIN in 5470 seconds. ID 301f6919-db2f-4f37-8fce-ec55b269bf8a

Oh, ok. The import completed log is missing, so I'll wait for it to either say it's done, or crash :D Thanks !

It worked ! And the search seems to return everything I need. Closing the ticket and thank you very much for your support.

Great news!
Thanks for working through this with me @ciprianaradulescu. I will be referring people to this ticket for UK import answers.

See this thread @abelardy ๐Ÿ˜„
Let us know how it goes!

See this thread @abelardy ๐Ÿ˜„ Let us know how it goes!

Thanks Kai... that --elasticsearch.index.max.terms.count=1000000 option was the final piece of the jigsaw!

The attached HTML document (hidden in a zip) is a detailed walkthrough that - cross fingers - works for standing up a Ubuntu VirtualBox VM, installing ElasticSearch and Snowstorm 8.1.0 on it, populating the Snowstorm server with the August 2023 release of the UK Monolith, and then testing it by firing ECL queries at it from other machines on the local network.

Snowstorm Install and Test 202308.zip

Brilliant, thanks for sharing @abelardy !