lifebit-ai/cloudos

Improve query definitions in R

hms1 opened this issue · 1 comments

hms1 commented

Defining CB queries in R via lists is not very user friendly, especially when there are multiple conditions. For example, even a relatively simple query with three conditions and two AND operators:

adv_query <- list(
  "operator" = "AND",
  "queries" = list(
    list( "id" = 13, "value" = list("from"="2016-01-21", "to"="2017-02-13")),
    list(
      "operator" = "AND",
      "queries" = list(
        list("id" = 4, "value" = "Cancer"),
        list("id" = 21, "value" = "Consenting")
        )
    )
  )
)

To simplify, we could include functions to define and combine individual phenotype definitions in a more modular fashion. As a starting point for discussion, the testing-new_query_syntax branch adds two new functions: new_phenotype_cont to define continuous variable phenotypes and new_phenotype_cat to define categorical variable phenotypes. They can be combined using overloaded &, | and ! operators. For example:

Get the package from the PR branch:

> git clone 'https://github.com/lifebit-ai/cloudos.git'                                                                                                                                   
> cd cloudos
> git checkout  testing-new_query_syntax

In the cloudos directory enter an R session (or do so in Rstudio) and load the package + config:

> devtools::install(".")
> library(cloudos)
> cloudos_configure(base_url = "http://cohort-browser-dev-110043291.eu-west-1.elb.amazonaws.com/cohort-browser/", 
token = "...api token...",
team_id = "5f7c8696d6ea46288645a89f")

Try building new queries:

A <- new_phenotype_cont(13, "2016-01-21", "2017-02-13")
B <- new_phenotype_cat(4, "Cancer")
C <- new_phenotype_cont(13, "2016-01-21", "2017-02-13")
D <- new_phenotype_cat(4, "Cancer")

########## test 1
AB <- A & B

cloudos::cb_apply_filter(cohort, adv_query = unclass(AB), keep_existing_filter = F)

########## test 2
AB <- A & !B

cohort <- cloudos::cb_load_cohort("60f96a97f2395b7f16a93c3a")

cloudos::cb_apply_filter(cohort, adv_query = unclass(AB), keep_existing_filter = F)

########## test 3
AB <- A | !B

cohort <- cloudos::cb_load_cohort("60f96a97f2395b7f16a93c3a")

cloudos::cb_apply_filter(cohort, adv_query = unclass(AB), keep_existing_filter = F)

########## test 3
AB <- (A | B) & (D | C) 

cohort <- cloudos::cb_load_cohort("60f96a97f2395b7f16a93c3a")

cloudos::cb_apply_filter(cohort, adv_query = unclass(AB), keep_existing_filter = F)
hms1 commented

Instead of new_phenotype_cont and new_phenotype_cat with specified arguments, it would be better to just have new_phenotype <- function(id, ...) where all additional arguments are passed to the cb.phenotype object. Something like:

> new_phenotype(13, value = list(from = "2015-01-01", to = "2017-01-01"))
$id
[1] 13

$value
$value$from
[1] "2015-01-01"

$value$to
[1] "2017-01-01"


attr(,"class")
[1] "cb.phenotype"

> new_phenotype(4, value = "Cancer")
$id
[1] 4

$value
[1] "Cancer"

attr(,"class")
[1] "cb.phenotype"