carissalow/rapids

Aggregating data across participants

sjgiorgi opened this issue · 4 comments

I'd like to aggregate each sensor data point across the entire study time span. In Segment Examples, I see examples on how to do this for daily, weekly, etc. time chunks. Is there a way to make 1 time chunk per participant?

What I've tried is using one event participants where each event is the entire duration of the study (where the maximum timestamp across all sensors is used as the end of the participant's study). Is this correct? Is there an easier way to do this within Rapids (right now I'm manually checking the timestamps across all sensors in order to find the maximum).

Hi @sjgiorgi, thank you for using RAPIDS! Currently, creating event segments as you describe would be the only way to extract features across the entire duration of each participant-specific study period. The ability to automatically pull those event segment start times and lengths based on all sensed timestamps is something we could potentially add as an enhancement in the future. Thanks for bringing this to our attention!

Thank you!

Is using maximum timestamp across all sensors correct? Or do we need a separate timestamp for each sensor?

For example, if location data ends on May 15 and wifi data ends on May 20, will setting the event timestamp to May 20 have any effect on the aggregate location data (that has 5 days without data)?

Hi @sjgiorgi, that’s a great question! I was not sure myself so I extracted features for a representative participant from one of our studies on event segments delineated by each sensor-specific minimum and maximum timestamp and by the overall minimum and maximum timestamp across all sensors.

We had about 1 month of activity recognition, battery, calls, locations, and screen data available for this participant. All of the sensor-specific features extracted on each respective sensor-specific event segment were exactly equal to the corresponding features extracted on the "overall" event segment. Values for phone data features were not equal across these segments, but the differences were fairly minimal (in the range of about 0.03).

Based on this, I think creating one event segment per participant using the maximum timestamp across all sensors should be okay. Alternatively, to be extra safe, you could consider creating one event segment per participant and sensor and discarding the irrelevant features (e.g., activity recognition features extracted within the battery event segment and vice versa, but potentially retaining all data yield features) after processing. Please let us know if you have any additional questions!

This is super helpful, thank you!