Welcome to Challenge-06!
In this challenge, you will be using FHIR Analytics Pipelines (OSS) to export data from the FHIR service for analytics and ML model development using Azure Synapse and Azure ML.
Health data aggregated in FHIR offers rich analytics potential for medical research of all sorts. With the FHIR service in Azure Health Data Services, FHIR data can be exported for use in Azure Synapse and Azure ML - giving researchers an end-to-end pipeline for health data ingestion, exploration, and model training. Practitioners can then use the models to run inference on patients' data in clinical workflows - leading to improved health outcomes for patients.
By the end of this challenge you will be able to
- implement AHDS Synapse Sync Agent (Parquet Export with Near Real Time Sync on FHIR Events)
- create external SQL Table representations of data in Synapse Serverless Pools
- create a SQL View for exploration and analysis of length of stay (LOS) and cost variables
- create a Synapse Notebook for Exploration and Analysis
- create and connect to an Azure ML Workspace
- export Features for Auto ML Training
- register Trained Model
- deploy model to an Azure ML Endpoint
- call the Azure ML model
- create an Azure logic app decision workflow
- have the decision workflow call Azure ML scoring model
- decision workflow triggered by AHDS events (extra)
- explore data with PowerBI (extra)
- FHIR service deployed (completed in Challenge-01)
- FHIR data imported into FHIR service (completed in Challenge-03 and Challenge-04)
- A copy (clone) of this repo on your local PC
In this challenge, you will be exporting data from your FHIR service to Azure Synapse for data exploration and analytics. In Synapse studio, you will use Azure AutoML to train a predictive scoring model. You will then publish and consume the model in realtime.
- Open the Azure portal, and at the top search for Synapse.
- In the search results, under Services, select Azure Synapse Analytics.
- Select Add to create a workspace.
- In the Basics tab, give the workspace a unique name. We'll use xxxahdschallengews in this document
- Under Select Data Lake Storage Gen 2, click Create New and name it xxxahdshallengelake.
- Under Select Data Lake Storage Gen 2, click File System and name it users.
- Select Review + create > Create.
Your workspace should be ready in a few minutes.
- Deploy FHIR Analytics Pipelines and export data in Parquet format to ADLS v2. This deployment will also create a SQL Database representation of the data that you can access from Azure Synapse or any client able to connect to SQL Server. You should follow all of the deployment instructions detailed in the FHIR to Synapse Sync Agent site before proceeding
- From the Azure Portal Navigate to your Synapse Workspace
- Select Reset SQL Admin Password
- Change to a compliant password and make a note of it.
Access Synapse Studio from the Portal and check out how your data is organizaed by FHIR Resource type in your serverless SQL Database.
How well do certain factors in patients' health data predict patients' healthcare costs and/or length of stay (LOS) in treatment facilities? Can we use the insights we gained to predict multi-day length of stays to help initiate cost and quality control measures.
- From Synapse Studio Select the Manage Icon from the Left Navigation Bar
- Select Apache Spark Pools
- Name Your new Spark Pool AHDS24MLPOOL
- Leave All Other Basics Settings as Defaulted
- Select Additional Settings Tab
- On the Apache Spark DropDown select version 2.4
- Select Review & Create
- Select Create
- From Azure Portal Access your Synapse Workspace Storage Account
- Select Containers
- Click the
+ Container
button - Name the container
resources
- Click the Create Button
- Select the resources container
- Click the
Upload
button - Click the select a file icon
- Browse to the resources directory of challenge-6
- Select all files in directory
- Click Open
- Click Upload button
We will combine some FHIR resource extract data into a SQL View to help us better identify features that might impact multi-day LOS and encounter costs. The main driver will be condition data from Encounters with co-related Claim, Patient, Procedure data joined in.
- In Azure Synapse Studio Select the Develop Icon from the the Left Navigation Bar
- Select the
+
sign to add a new resource - Select SQL Script
- Change the Use Database dropdown to 'fhirdb'
- Paste the following SQL Code block into the SQL Script Tab
CREATE VIEW fhir.ConditionView as
SELECT JSON_VALUE([Condition].[code.coding],'$[0].code') as "ConditionCode",
[Condition].[code.text] as "ConditionDescription",
DATEDIFF(year,[Patient].[Birthdate],GETDATE()) as "Age",
[Patient].[Gender],
[Patient].[PostalCode],
[Patient].[Country],
[Encounter].[LOSDays],
[Encounter].[Encounter-Type] as EncounterType,
[Encounter].[Encounter-Description] as EncounterDescription,
(SELECT COUNT(*) as "ProcedureCount" FROM [fhir].[Procedure] WHERE [Procedure].[encounter.reference]=[Condition].[encounter.reference]) as "ProcedureCount",
(SELECT SUM([total.value]) as "EncounterCost" FROM [fhir].[Claim]
WHERE [fhir].[Claim].[patient.reference]=[Condition].[subject.reference]
AND CONVERT(datetime,SUBSTRING([billablePeriod.start],1,19))>=[Encounter].[Encounter-StartDate]
AND CONVERT(datetime,SUBSTRING([billablePeriod.end],1,19))<=[Encounter].[Encounter-EndDate]
) as "EncounterCost"
FROM [fhir].[Condition] as [Condition]
INNER JOIN (
SELECT
[id],JSON_VALUE([name],'$[0].family') as "name",CONVERT(datetime,[birthDate]) as "Birthdate",[gender] as "Gender",
JSON_VALUE([address],'$[0].postalCode') as "PostalCode",
JSON_VALUE([address],'$[0].country') as "Country"
FROM [fhir].[Patient]
AS [Patient]
) [Patient] on [Patient].[id]=SUBSTRING([Condition].[subject.reference],9,100)
INNER JOIN (
SELECT
[id],
JSON_VALUE([type],'$[0].coding[0].code') as "Encounter-Type",
JSON_VALUE([type],'$[0].coding[0].display') as "Encounter-Description",
CONVERT(datetime,SUBSTRING([period.start],1,19)) as "Encounter-StartDate",
CONVERT(datetime,SUBSTRING([period.end],1,19)) as "Encounter-EndDate",
DATEDIFF(day,CONVERT(datetime,SUBSTRING([period.start],1,19)),CONVERT(datetime,SUBSTRING([period.end],1,19))) as "LOSDays"
FROM [fhir].[Encounter]
AS [Encounter]
) [Encounter] on [Encounter].[id]=SUBSTRING([Condition].[encounter.reference],11,100)
- Press the Run Button
- The new ConditionView should now be created in your fhirdb database.
- Replace the SQL Code block in SQL Script 1 window with the following:
select * from fhir.ConditionView
- Press the Run Button
- The query results may take several minutes The results contains a tabular view of the data we joined as described above. You can also use the chart component to visual the query results.
- In Azure Synapse Studio Select the Develop Icon from the the Left Navigation Bar
- Select the
+
sign to add a new resource - Select Import
- Browse to the notebook directory of challenge-6 and select the
ConditionAnalysisNotebook.json
file. - Click Open
- On the Attach To dropdown select the AHDS24MLPOOL
- On the Language dropdown select PySpark(Python)
In the following steps we will be traversing and executing cells of functionality in the Synapse Notebook we just imported. Please see this link for information on Synapse Notebooks.
- In the Cell
Import Libraries and Set Environment Variables
you will need to replace variables containe in<>
with the values for your synapse envrironment - Execute the
Import Libraries and Set Environment Variables
cell in the notebook after defining variables
- Read and Execute the following cells in order. Theses cells are related to exploring data in PySpark environment:
Load the ConditionView into a DataFrame
Calculate the MultiDayLOS flag
Scatter Plot to see effect of Age on EncounterCost
- Read and Execute the following cells in order. Theses cells are related to configuring and running training experiments in Azure Machine Learning using AutoML:
Select Features for Auto ML Training
Create Azure ML Workspace
Connect to Azure ML Workspace
Convert Training Data to Azure ML TabularData Set
Define Training settings
Configure Auto ML
Train the automatic regression Model
Retrieve the best model
Test Model Accuracy
Root Mean Square
MAPE and Accuracy
Model fitting to Data Test
- Read and Execute the following cells in order. Theses cells are related to publishing a scoring model to Azure Machine Learning:
Register Model to Azure ML for Publication
Initialize Scoring Environment
Copy Score Script from Blob Storage
Deploy Model Azure ML Endpoint
Test Deployed Service
Using Azure Health Data Service Eventing and the Predictive Model we published above, can you invoke the model as part of an event driven decision support process? Please refer to this tutorial as a guiding example.
If you are in possession of a Power BI license, please refer to this tutorial on exploring data with Power BI in Azure Synapse
- Successfully export data from FHIR service using Synapse Sync Agent (OSS)
- Successfully explore data in synapse using SQL Views
- Successfully use the data for statistical analysis and ML modeling
- Successfully publish a scoring model for realtime consumption
Click here to proceed to the next challenge.