Generate stats is hard coded to be false
gowthamrao opened this issue · 7 comments
Expected behavior: we need to be able to generate sql with statistics
Actual behavior: Hydra hard codes generate stats to FALSE
Impact - the generated SQL has
FROM #final_cohort CO
;{0 != 0}?{
-- Find the event that is the 'best match' per person.
-- the 'best match' is defined as the event that satisfies the most inclusion rules.
-- ties are solved by choosing the event that matches the earliest inclusion rule, and then earliest.select q.person_id, q.event_id
Maybe we could replace the use of java with R to generate the sql?
genOp <- CirceR::createGenerateOptions(cohortIdFieldName = "cohort_definition_id",
cohortId = fileName,
cdmSchema = "@cdm_database_schema",
targetTable = "@target_cohort_table",
resultSchema = "@results_database_schema",
vocabularySchema = "@vocabulary_database_schema",
generateStats = TRUE)
sql <- CirceR::buildCohortQuery(expression = cohortExpression, options = genOp)
SqlRender::writeSql(sql = sql, targetFile = file.path(outputFolder, 'inst', 'sql', 'sql_server', paste0(fileName, '.sql')))
We can't replace the Java with R, because Hydra is also used in WebAPI to hydrate packages where R is not available.
We can add an argument allowing one to set generate statistics to TRUE, that would default to the current behavior for backwards compatibility.
Now in the develop branch. I added an example usage here: OHDSI/SkeletonCohortDiagnosticsStudy@ab282ed
I haven't tested it. Leaving that to @gowthamrao .
Thank you @schuemie . That makes sense, because WebApi cannot use R.
Would it be possible to add an option called 'useRtoGenerateSql = FALSE', and if TRUE - it could use the installed version of CirceR for
sql <- CirceR::buildCohortQuery(expression = cohortExpression, options = genOp)
IF TRUE, it would overwrite the SQL output of Java. WebApi wont be impacted because Java would render the SQL by default
Reason: it would allow R users to select CirceR version
This seems to be a very special use case, so I don't want to add a lot of complexity to Hydra (that would also need to have its own testing apparatus) just for that.
If a user would want to use a specific CirceR version instead they can always do that after Hydra is finished, and simply overwrite the SQL files.
Yes - that makes sense. Agree. thanks
I haven't tested it. Leaving that to @gowthamrao .
btw - tested the new code after changes,
works perfectly