THLfi/read.gt3x

Sensor idle state values seem to be not handled

martakarass opened this issue · 4 comments

I have gt3x file with ~40 minutes of raw accelerometry data collected at 100 Hz, with idle mode enabled. I have corresponding ActiLife raw data output CSV. I try to use read.gt3x::read.gt3x to replicate ActiLife raw data output.

  • I note is that sensor idle state values seem not to be handled; that is, I see no measurement observations for a few minutes of data in a middle; note I recall sensor was laying still on a desk for most of the 40 minutes.

I provide files and code showing the missingness of the values below.

Links to (1) GT3X file, (2) ActiLife raw data output CSV. Download data

rm(list = ls())
library(data.table)
library(dplyr)
library(ggplot2)
library(reshape2)
library(lubridate)
library(read.gt3x)

## Personal dropbox sharing links to (1) GT3X file, (2) ActiLife raw data output CSV
gt3x.fpath <- "https://www.dropbox.com/s/pi7m9b75gqzl64g/TAS1H30182785%20%282019-09-17%29.gt3x?dl=1"
csv.fpath <- "https://www.dropbox.com/s/hhdqjntarqxgpuu/TAS1H30182785%20%282019-09-17%29RAW.csv?dl=1"

file_directory <- getwd()
gt3x.destfile <- file.path(file_directory, "TAS1H30182785 (2019-09-17).gt3x")
csv.destfile <- file.path(file_directory, "TAS1H30182785 (2019-09-17).csv")

## Download files to wd
if (!file.exists(gt3x.destfile)) download.file(gt3x.fpath, gt3x.destfile)
if (!file.exists(csv.destfile)) download.file(csv.fpath, csv.destfile)

Read Actilife raw data output CSV. Display metadata

## Read mete header from Actilife raw data output CSV
as.character(unlist(read.csv(file = csv.destfile, nrows = 6)))
# [1] "Serial Number: TAS1H30182785"     "Start Time 18:40:00"             
# [3] "Start Date 9/17/2019"             "Epoch Period (hh:mm:ss) 00:00:00"
# [5] "Download Time 19:20:05"           "Download Date 9/17/2019" 

## Read Actilife raw data output CSV
dat1 <- as.data.frame(fread(csv.destfile))

## Expected vs actual number of observations in raw data CSV Actilife output 
hz <- 100 
collection_dur_s <- as.numeric(difftime(as.POSIXct("2019-09-17 19:20:05"), 
                                        as.POSIXct("2019-09-17 18:40:00"), units = "secs"))
nrow_exp <- hz * collection_dur_s
nrow_act <- nrow(dat1)
c(nrow_exp, nrow_act) 
# [1] 240500 240500

Read gt3x file with read.gt3x(imputeZeroes = FALSE): sensor idle state values seem to be not handled

To show that sensor idle state values are not handled, we:

  1. read GT3X with read.gt3x(imputeZeroes = FALSE); then, for for read.gt3x read outcome
  2. compute number of observation per each second of data collection,
  3. compute difference in time (seconds) between previous second from which there are any observations; here, we see differences larger than 1 sencond (106, 555) - those intervals are not handled

We see differences between neighbouring measurement times to be i.e. 106, 555.

dat2 <- read.gt3x(gt3x.destfile, asDataFrame = TRUE, imputeZeroes = FALSE, verbose = TRUE)
# Input is a .gt3x file, unzipping to a temporary location first...
# Unzipping gt3x data to /var/folders/gh/sq_rf7h57nx46sv9q4sz7nk00000gn/T//RtmpXGkNsi
# 1/1 
# Unzipping /Users/martakaras/Dropbox/_PROJECTS/OLD/LogisticPEER/Code/TAS1H30182785 (2019-09-17).gt3x
# === info.txt, log.bin extracted to /var/folders/gh/sq_rf7h57nx46sv9q4sz7nk00000gn/T//RtmpXGkNsi/TAS1H30182785(2019-09-17)
# GT3X information
# $ Serial Number     :"TAS1H30182785"
# $ Device Type       :"Link"
# $ Firmware          :"1.7.2"
# $ Battery Voltage   :"4.18"
# $ Sample Rate       :100
# $ Start Date        : POSIXct, format: "2019-09-17 18:40:00"
# $ Stop Date         : POSIXct, format: "2019-09-18 19:00:00"
# $ Last Sample Time  : POSIXct, format: "2019-09-17 19:20:05"
# $ TimeZone          :"-04:00:00"
# $ Download Date     : POSIXct, format: "2019-09-17 19:20:05"
# $ Board Revision    :"8"
# $ Unexpected Resets :"0"
# $ Acceleration Scale:256
# $ Acceleration Min  :"-8.0"
# $ Acceleration Max  :"8.0"
# $ Subject Name      :"suffix_85"
# $ Serial Prefix     :"TAS"
# Parsing GT3X data via CPP.. expected sample size: 240500
# ---GT3X PARAMETERS
# address: 0 key: 6 value: 1
# address: 0 key: 7 value: 54703161
# address: 0 key: 8 value: 8
# address: 0 key: 9 value: 1534154836
# address: 0 key: 13 value: 17235970
# address: 0 key: 16 value: 3791650816
# address: 0 key: 20 value: 0
# address: 0 key: 21 value: 0
# address: 0 key: 22 value: 0
# address: 0 key: 23 value: 0
# address: 0 key: 26 value: 2
# address: 0 key: 28 value: 262013
# address: 0 key: 29 value: 255
# address: 0 key: 32 value: 16908288
# address: 0 key: 37 value: 1024
# address: 0 key: 38 value: 0
# address: 0 key: 49 value: 2048
# address: 0 key: 50 value: 88181047
# address: 0 key: 51 value: 6.82667
# address: 0 key: 55 value: 256
# address: 0 key: 57 value: 333.87
# address: 0 key: 58 value: 21
# address: 0 key: 61 value: 2
# address: 1 key: 0 value: 0
# address: 1 key: 1 value: 872668711
# address: 1 key: 2 value: 388
# address: 1 key: 3 value: 1
# address: 1 key: 4 value: 4294967131
# address: 1 key: 5 value: 4294967095
# address: 1 key: 6 value: 4294967149
# address: 1 key: 7 value: 298
# address: 1 key: 8 value: 286
# address: 1 key: 9 value: 300
# address: 1 key: 10 value: 100
# address: 1 key: 12 (start time)  value: 1568745600
# address: 1 key: 13 value: 1568833200
# address: 1 key: 14 value: 1568745556
# address: 1 key: 15 value: 74
# address: 1 key: 16 value: 40
# address: 1 key: 17 value: 72
# address: 1 key: 20 value: 0
# address: 1 key: 21 value: 0
# address: 1 key: 33 value: 60000
# address: 1 key: 34 value: 4294965247
# address: 1 key: 35 value: 4294965190
# address: 1 key: 36 value: 4294965237
# address: 1 key: 37 value: 2051
# address: 1 key: 38 value: 2000
# address: 1 key: 39 value: 2048
# address: 1 key: 40 value: 0
# address: 1 key: 41 value: 1
# address: 1 key: 42 value: 0
# address: 1 key: 43 value: 4294967283
# address: 1 key: 44 value: 0
# address: 1 key: 45 value: 0
# address: 1 key: 46 value: 0
# ---END PARAMETERS
# 
# Total Records: 32900
# Scaling...
# Removing excess rows 
# Creating dimnames 
# CPP returning 
# Done (in 0.0362148284912109 seconds)

Compute differences between observations (after collapsing timestamp of each to its second-level floor):

dat2 %>%
  filter(time >= as.POSIXct("2019-09-17 18:44:00", tz = "GMT"),
         time < as.POSIXct("2019-09-17 19:00:00", tz = "GMT")) %>% 
  mutate(time_floor_s = floor_date(time, "second")) %>%
  group_by(time_floor_s) %>%
  summarise(cnt = n()) %>%
  arrange(time_floor_s) %>%
  mutate(secs_diff = as.numeric(difftime(time_floor_s , lag(time_floor_s, 1)))) %>%
  as.data.frame()
time_floor_s cnt secs_diff
# 1  2019-09-17 18:44:00 100        NA
# 2  2019-09-17 18:44:01 100         1
# 3  2019-09-17 18:44:02 100         1
# 4  2019-09-17 18:44:03 100         1
# 5  2019-09-17 18:44:04 100         1
# 6  2019-09-17 18:44:05 100         1
# 7  2019-09-17 18:44:06 100         1
# 8  2019-09-17 18:44:07 100         1
# 9  2019-09-17 18:44:08 100         1
# 10 2019-09-17 18:44:09 100         1
# 11 2019-09-17 18:44:10 100         1
# 12 2019-09-17 18:44:11 100         1
# 13 2019-09-17 18:44:12 100         1
# 14 2019-09-17 18:44:13 100         1
# 15 2019-09-17 18:44:14 100         1
# 16 2019-09-17 18:44:15 100         1
# 17 2019-09-17 18:44:16 100         1
# 18 2019-09-17 18:44:17 100         1
# 19 2019-09-17 18:44:18 100         1
# 20 2019-09-17 18:44:19 100         1
# 21 2019-09-17 18:44:20 100         1
# 22 2019-09-17 18:46:06 100       106
# 23 2019-09-17 18:46:07 100         1
# 24 2019-09-17 18:46:08 100         1
# 25 2019-09-17 18:46:09 100         1
# 26 2019-09-17 18:46:10 100         1
# 27 2019-09-17 18:46:11 100         1
# 28 2019-09-17 18:46:12 100         1
# 29 2019-09-17 18:46:13 100         1
# 30 2019-09-17 18:46:14 100         1
# 31 2019-09-17 18:46:15 100         1
# 32 2019-09-17 18:46:16 100         1
# 33 2019-09-17 18:55:31 100       555
# 34 2019-09-17 18:55:32 100         1
# 35 2019-09-17 18:55:33 100         1
# 36 2019-09-17 18:55:34 100         1
# 37 2019-09-17 18:55:35 100         1
# 38 2019-09-17 18:55:36 100         1
# 39 2019-09-17 18:55:37 100         1
# 40 2019-09-17 18:55:38 100         1
# 41 2019-09-17 18:55:39 100         1
# 42 2019-09-17 18:55:40 100         1
# 43 2019-09-17 18:55:41 100         1
# 44 2019-09-17 18:55:42 100         1
# 45 2019-09-17 18:55:43 100         1
# 46 2019-09-17 18:55:44 100         1

Read gt3x file with read.gt3x(imputeZeroes = TRUE): sensor idle state values seem to be not handled too

Exactly the same situation with read.gt3x(imputeZeroes = TRUE).

dat3 <- read.gt3x(gt3x.destfile, asDataFrame = TRUE, imputeZeroes = TRUE, verbose = TRUE)

dat3 %>%
  filter(time >= as.POSIXct("2019-09-17 18:44:00", tz = "GMT"),
         time < as.POSIXct("2019-09-17 19:00:00", tz = "GMT")) %>% 
  mutate(time_floor_s = floor_date(time, "second")) %>%
  group_by(time_floor_s) %>%
  summarise(cnt = n()) %>%
  arrange(time_floor_s) %>%
  mutate(secs_diff = as.numeric(difftime(time_floor_s , lag(time_floor_s, 1)))) %>%
  as.data.frame()

# time_floor_s cnt secs_diff
# 1  2019-09-17 18:44:00 100        NA
# 2  2019-09-17 18:44:01 100         1
# 3  2019-09-17 18:44:02 100         1
# 4  2019-09-17 18:44:03 100         1
# 5  2019-09-17 18:44:04 100         1
# 6  2019-09-17 18:44:05 100         1
# 7  2019-09-17 18:44:06 100         1
# 8  2019-09-17 18:44:07 100         1
# 9  2019-09-17 18:44:08 100         1
# 10 2019-09-17 18:44:09 100         1
# 11 2019-09-17 18:44:10 100         1
# 12 2019-09-17 18:44:11 100         1
# 13 2019-09-17 18:44:12 100         1
# 14 2019-09-17 18:44:13 100         1
# 15 2019-09-17 18:44:14 100         1
# 16 2019-09-17 18:44:15 100         1
# 17 2019-09-17 18:44:16 100         1
# 18 2019-09-17 18:44:17 100         1
# 19 2019-09-17 18:44:18 100         1
# 20 2019-09-17 18:44:19 100         1
# 21 2019-09-17 18:44:20 100         1
# 22 2019-09-17 18:46:06 100       106
# 23 2019-09-17 18:46:07 100         1
# 24 2019-09-17 18:46:08 100         1
# 25 2019-09-17 18:46:09 100         1
# 26 2019-09-17 18:46:10 100         1
# 27 2019-09-17 18:46:11 100         1
# 28 2019-09-17 18:46:12 100         1
# 29 2019-09-17 18:46:13 100         1
# 30 2019-09-17 18:46:14 100         1
# 31 2019-09-17 18:46:15 100         1
# 32 2019-09-17 18:46:16 100         1
# 33 2019-09-17 18:55:31 100       555
# 34 2019-09-17 18:55:32 100         1
# 35 2019-09-17 18:55:33 100         1
# 36 2019-09-17 18:55:34 100         1
# 37 2019-09-17 18:55:35 100         1
# 38 2019-09-17 18:55:36 100         1
# 39 2019-09-17 18:55:37 100         1
# 40 2019-09-17 18:55:38 100         1
# 41 2019-09-17 18:55:39 100         1
# 42 2019-09-17 18:55:40 100         1
# 43 2019-09-17 18:55:41 100         1
# 44 2019-09-17 18:55:42 100         1
# 45 2019-09-17 18:55:43 100         1
# 46 2019-09-17 18:55:44 100         1

Session info

sessionInfo()
# R version 3.5.2 (2018-12-20)
# Platform: x86_64-apple-darwin15.6.0 (64-bit)
# Running under: macOS Mojave 10.14.2
# 
# Matrix products: default
# BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
# LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
# 
# locale:
#   [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
# 
# attached base packages:
#   [1] stats     graphics  grDevices utils     datasets  methods   base     
# 
# other attached packages:
#   [1] read.gt3x_0.1.0   AGread_1.0.0      lubridate_1.7.4   reshape2_1.4.3   
# [5] ggplot2_3.2.1     dplyr_0.8.3       data.table_1.12.2
# 
# loaded via a namespace (and not attached):
# [1] tidyselect_0.2.5  xfun_0.7          remotes_2.0.4     purrr_0.3.2      
# [5] colorspace_1.4-1  testthat_2.1.1    htmltools_0.3.6   usethis_1.5.0    
# [9] yaml_2.2.0        rlang_0.4.0       pkgbuild_1.0.3    pillar_1.4.2     
# [13] glue_1.3.1        withr_2.1.2       sessioninfo_1.1.1 plyr_1.8.4       
# [17] stringr_1.4.0     munsell_0.5.0     gtable_0.3.0      anytime_0.3.6    
# [21] devtools_2.0.2    memoise_1.1.0     evaluate_0.14     labeling_0.3     
# [25] knitr_1.23        callr_3.2.0       ps_1.3.0          curl_3.3         
# [29] Rcpp_1.0.2        scales_1.0.0      backports_1.1.4   desc_1.2.0       
# [33] pkgload_1.0.2     fs_1.3.1          digest_0.6.20     stringi_1.4.3    
# [37] processx_3.3.1    binaryLogic_0.3.9 grid_3.5.2        rprojroot_1.3-2  
# [41] cli_1.1.0         tools_3.5.2       PAutilities_0.2.0 magrittr_1.5     
# [45] lazyeval_0.2.2    tibble_2.1.3      crayon_1.3.4      pkgconfig_2.0.2  
# [49] rsconnect_0.8.13  prettyunits_1.0.2 assertthat_0.2.1  rmarkdown_1.15   
# [53] rstudioapi_0.10   R6_2.4.0          compiler_3.5.2   

Thanks @martakarass I will take a look at this during the coming weeks. Seems like a bug.

The imputeZeroes option had a bug in the C++ code which is now fixed.

commit: abf225e

ImputeZeroes fix

Tuomo Nieminen
October 6, 2019

library(data.table)
library(dplyr)
library(ggplot2)
library(reshape2)
library(lubridate)
library(read.gt3x)
## Personal dropbox sharing links to (1) GT3X file, (2) ActiLife raw data output CSV
gt3x.fpath <- "https://www.dropbox.com/s/pi7m9b75gqzl64g/TAS1H30182785%20%282019-09-17%29.gt3x?dl=1"
csv.fpath <- "https://www.dropbox.com/s/hhdqjntarqxgpuu/TAS1H30182785%20%282019-09-17%29RAW.csv?dl=1"

file_directory <- getwd()
gt3x.destfile <- file.path(file_directory, "TAS1H30182785 (2019-09-17).gt3x")
csv.destfile <- file.path(file_directory, "TAS1H30182785 (2019-09-17).csv")

## Download files to wd
if (!file.exists(gt3x.destfile)) download.file(gt3x.fpath, gt3x.destfile)
if (!file.exists(csv.destfile)) download.file(csv.fpath, csv.destfile)

ImputeZeroes Fixed behaviour

dat3 <- read.gt3x(gt3x.destfile, asDataFrame = TRUE, imputeZeroes = TRUE)

dat3 %>%
  filter(time >= as.POSIXct("2019-09-17 18:44:00", tz = "GMT"),
         time < as.POSIXct("2019-09-17 19:00:00", tz = "GMT")) %>% 
  mutate(time_floor_s = floor_date(time, "second")) %>%
  group_by(time_floor_s) %>%
  summarise(cnt = n()) %>%
  arrange(time_floor_s) %>%
  mutate(secs_diff = as.numeric(difftime(time_floor_s , lag(time_floor_s, 1)))) %>%
  as.data.frame() %>% head(30)
##           time_floor_s cnt secs_diff
## 1  2019-09-17 18:44:00 100        NA
## 2  2019-09-17 18:44:01 100         1
## 3  2019-09-17 18:44:02 100         1
## 4  2019-09-17 18:44:03 100         1
## 5  2019-09-17 18:44:04 100         1
## 6  2019-09-17 18:44:05 100         1
## 7  2019-09-17 18:44:06 100         1
## 8  2019-09-17 18:44:07 100         1
## 9  2019-09-17 18:44:08 100         1
## 10 2019-09-17 18:44:09 100         1
## 11 2019-09-17 18:44:10 100         1
## 12 2019-09-17 18:44:11 100         1
## 13 2019-09-17 18:44:12 100         1
## 14 2019-09-17 18:44:13 100         1
## 15 2019-09-17 18:44:14 100         1
## 16 2019-09-17 18:44:15 100         1
## 17 2019-09-17 18:44:16 100         1
## 18 2019-09-17 18:44:17 100         1
## 19 2019-09-17 18:44:18 100         1
## 20 2019-09-17 18:44:19 100         1
## 21 2019-09-17 18:44:20 100         1
## 22 2019-09-17 18:44:21 100         1
## 23 2019-09-17 18:44:22 100         1
## 24 2019-09-17 18:44:23 100         1
## 25 2019-09-17 18:44:24 100         1
## 26 2019-09-17 18:44:25 100         1
## 27 2019-09-17 18:44:26 100         1
## 28 2019-09-17 18:44:27 100         1
## 29 2019-09-17 18:44:28 100         1
## 30 2019-09-17 18:44:29 100         1

Thank you so much for the update!
I installed from 1 parent e89c8f1 commit abf225e9950222828c6cfaa0be520474ac6c5568
and I do see the issue described above is no longer the case.

However, I have observed 3 other discrepancies. I am unsure if they are related to this issue, so I will post three separate issues with the same example data as I used above.

FYI @muschellij2