OHDSI/Aphrodite

Que - flags vs search domains in APHRODITE

Closed this issue · 5 comments

Hi @jmbanda,

I was exploring the "Settings.R" file and also going through the APHRODITE manual when I came across the below terms.

a) Flags
b) Search Domains

I understand that flags are used to indicate the domains to look for extracting the features. Meaning if I set flag$drug_exposure = 1, all drugs present will be used as columns/features(with 1s and 0s)

But may I kindly check with you on what is "Search Domains"? I believe this is not for searching the Anchor terms and labeling the terms because they usually look at "observation" and "condition" domains by default. Am I right?

Our labels are determined by presence or absence of anchor terms in conditions or observations domain. Right? But I also see a comment like below in the Settings.R file for search domains.

This new setting indicates where to look for matches on the labeling heuristic

So, the use of "Search Domains" is something that I don't understand. Can help me with this using an example on what it is and when to use it?

Search Domains are types of vocabulary domains that you want to block from your features. This is for removing unwanted domains from things that might come from the note_nlp table, as the annotations on that table from the clinical notes can come from anywhere in the vocabulary.

Ak784 commented

Hi @jmbanda and @SSMK-wq

I understand the issue is closed by @SSMK-wq but just a quick question to better understand the difference between flag and SearchDomain.

Lets say that my flag setting looks like below

image

q1) I understand that the model will be built using features (frequency type) from the drug_exposure, labs, conditions and observations domain. The model will not include features from procedures, note_nlp and visits. Am I right till here?

q2) To exclude features from a specific domain, can't I just set the flag to 0 like above? Why do I have to again use remove_Domain? what's the use of the remove_Domain flag? I did refer to the sample code from Github. But couldn't understand whats the difference between setting the flag to 0 and remove_domain. can help me with this?

Now, my searchDomain setting looks like as shown below

image

q3) Same question as above by @SSMK-wq. Your code has the below description

### Search Domains #########
### This new setting indicates where to look for matches on the labeling heuristic

Does that mean the keywordlist_ALL (aphrodite concept_name and related terms) will be checked for presence in any of the domains (set as 1) under the SearchDomain dataframe?

If a patient has any of these terms in domains specified above, he will be labeled as a case?

That's how I understand from the comments in code. Am I right to understand this?

q4) I am bit confused (purely due to my limitation) with your response above in this thread. May I kindly check why do we have to block unwanted domains from being used as features using SearchDomain flag? Didn't we turn off certain domains (set as 0) using flag settings? Why again turn them off using SearchDomain?

q5) Can please help me understand what does SearchDomain do and how is it different from flag?

The searchDomain flag will enable to look for concepts outside of conditions and observations. In other words, you can add a concept for a lab in the keywords file and build a phenotype using a lab code, or procedure, etc.

The remove_domain flag is there when there are some pesky concepts that appear in the models that might not be desired, like units.

Ak784 commented

Hi @jmbanda ,

Okay. So, based on my setting above, I think the keywordlist_ALL (T2DM and related concepts) will be searched in drug_exposure, measurement, labs, conditions, and observations as well (though that basically is a condition concept), am unnecessarily searching for it unrelated domains such as labs, drugs (refer my searchDomain setting above) leading to performance delay.

You can leave a response only if my understanding is incorrect. I don't wish to take up your time in your busy schedule. Appreciate your help.

  1. It will also extract those features, if you block them you will not get those features, try it out to see the behavior.
  2. Searching for concepts that are not in those tables, if you have everything properly indexed, takes a fraction of a second. So there is not much overhead after doing my experimental evaluation.