Duplicate patient records
Opened this issue · 6 comments
popHealth v5.1.1, bundle 2017.0.3, commit # 711a58
Testing with Cypress 3.2.3 and duplicate testing. Our calculation numbers are of by the duplicate patients that meet the criteria. Is there any settings for handling duplicate records
At the bottom Welch Cori and Welch Cory are duplicates.
What criteria does pohealth use to determine duplicate patients?
Can you send us the two patient file - Welch Cori and Welch Cory? We will examine where the dedup logic went wrong.
There are some logic in the code that checks for duplication against existing patient record in the MongoDB. Ragu Naga probably can give you additional information on the logic. Thanks for sending the zip file. We will examine it and update the dedup logic accordingly.
It appears the problem is cause by threading code introduced in 5.1 .
Number of Threads for Patient Import
popHewalth.yml - patient_import_threads: 4
popHealth\lib\hds\bulk_record_importer.rb (see code below)
The QRDA.zip file contains duplicate patients. When running multiple threads our upload totals fluctuated between 79 - 76. Our Cypress 2017 test deck tests where failing. When we changed the setting to patient_import_threads: 1 and restated pophealth_delayed_worker. All Cypress 2071 test deck CQM measures are passing and our upload totals are consistent 76.
sudo systemctl stop pophealth_delayed_worker
sudo systemctl start pophealth_delayed_worker
passenger-config restart-app
installed settings
popHealth.yml
Number of Threads for Patient Import
patient_import_threads: 4
C:\Users\evillaveces\git\pophealth\popHealth\lib\hds\bulk_record_importer.rb
Line 31:
Build an array of worker threads that will access our thread-safe task queue,
# and run the import for each file in the task queue.
workers = []
APP_CONFIG['patient_import_threads'].to_i.times do
workers << Thread.new do
until tasks.empty?
# To avoid deadlock, we pass true to pop so it throws if the queue is empty.
# We ignore the exception, and just use it as an indication that all the work
# is done. Without this the call to pop would just hang until more work shows
# up, and this would never finish.
row = tasks.pop(true) rescue nil
if row
self.import_file(row[:name],row[:data],row[:failed_dir],nil,row[:practice])
end
end
end
end
# Process the threads and wait until all are completed before continuing
workers.each { |t| t.join }
Thanks for the update. The temporary solution is set the patient_import_threads: 1 until we figure out how to deal with deduplication.