Error parsing Fitbit heartrate summary JSON data in pull_wearable_data rule
jenniferfedor opened this issue · 1 comments
When processing Fitbit heartrate summary data for a particular device from a single participant using the Fitbit JSON MySQL data stream, we encountered the following error when executing the pull_wearable_data
rule:
rule pull_wearable_data:
input: data/external/participant_files/p1170.yaml, src/data/streams/rapids_columns.yaml, src/data/streams/fitbitjson_mysql/format.yaml, src/data/streams/fitbitjson_mysql/container.R, src/data/streams/mutations/fitbit/parse_heartrate_summary_json.py, src/data/streams/mutations/fitbit/add_zero_timestamp.py
output: data/raw/p1170/fitbit_heartrate_summary_raw.csv
jobid: 1
wildcards: pid=p1170, device_type=fitbit, sensor=heartrate_summary
Attaching package: ‘dplyr’
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
Warning message:
package ‘readr’ was built under R version 4.0.5
Processing FITBIT_HEARTRATE_SUMMARY for cf0992de-be2e-4070-ac6c-2f71f857aab0
Executing the following query to download data: SELECT device_id,fitbit_data FROM fitbit_data_from_api_v2 WHERE device_id = 'cf0992de-be2e-4070-ac6c-2f71f857aab0'
Applying mutation script src/data/streams/mutations/fitbit/parse_heartrate_summary_json.py
Error in `mutate_cols()`:
! Problem with `mutate()` input `..1`.
✖ missing value where TRUE/FALSE needed
ℹ Input `..1` is `(function (.cols = everything(), .fns = NULL, ..., .names = NULL) ...`.
Caused by error in `if (!is.character(value) && !is.nan(value)) ...`:
! missing value where TRUE/FALSE needed
Backtrace:
▆
1. ├─global mutate_data(mutation_scripts, renamed_data, data_configuration)
2. │ └─data %>% ...
3. ├─dplyr::mutate(., across(where(is.list), fix_pandas_nan_in_string_columns))
4. ├─dplyr:::mutate.data.frame(., across(where(is.list), fix_pandas_nan_in_string_columns))
5. │ └─dplyr:::mutate_cols(.data, ...)
6. │ ├─base::withCallingHandlers(...)
7. │ └─mask$eval_all_mutate(quo)
8. ├─global `<fn>`(heartrate_daily_restinghr)
9. │ └─base::vapply(...)
10. │ └─FUN(X[[i]], ...)
11. └─base::.handleSimpleError(...)
12. └─dplyr (local) h(simpleError(msg, call))
13. └─rlang::abort(...)
Execution halted
We are using RAPIDS v1.9.4 running on Ubuntu 20.04. It seems the error is caused by the use of None
to represent missing values in the src/data/streams/mutations/fitbit/parse_heartrate_summary_json.py
mutation script, which is executed within the src/data/streams/pull_wearable_data.R
script via {reticulate}
. In the python script, missing values for expected columns are set to None
. None
values in a pandas series (e.g., a DataFrame column) are normally coerced to NaN
when other numeric values are present, and python's NaN
is also interpreted as NaN
within R. However, this device for this participant had only one row of Fitbit heartrate summary data and a missing value for heartrate_daily_restinghr
which was set to None
. Because there were no other numeric values present in that column, this value of None
is not coerced to NaN
and is interpreted by R as NULL
. Evaluating NULL
with !is.nan()
returns a logical vector of length 0 rather than a TRUE
or FALSE
as expected, resulting in this error. To account for this, we can replace any instances of None
in the mutation script with np.NaN
.
Fixed in #226.