davidwagner/phone-theft-research

Negative time durations in classifier scripts

Opened this issue · 2 comments

I am seeing negative time durations in some cases in the spreadsheet output by the classifiers-scripts.

For instance, looking at one run (at 12/13 12:43), I see the following for user f700538e:

  • in the .csv file: In column "Total Time Negative (Table Classifier)", "-6.0h:55.0m:3.617s", and negative time values in other columns for the table classifier. The "Longest Negative Period" is listed as "08:37:54-03:32:57".

  • in the .txt file: It reports a single interval, of that length.

Table Classifier
-------------------------------------
Result Intervals
('(08:37:54--03:32:57)', 0)
Positive Intervals
Negative Intervals
(08:37:54--03:32:57)
###########################

I don't see any evidence of gaps or time going backward in the timestamps in the decrypted BatchedAccelerometer sensor readings. I looked at the first and last line of each decrypted .csv BatchedAccelerometer file for user f700538e for today and there's nothing obviously weird there -- no timestamps going backwards, no gaps in the timestamps. (See the file phone_data/Anomaly_results/debug_for_steven1.txt if you want to look at it yourself.) I also checked each of those files and all of the timestamps are in increasing order (there's no row where the timestamp is smaller than the preceding row's timestamp).

So I'm not sure what's going on here. Could there be something wrong with how intervals are calculated or formed or merged? Could something have gone awry with conversion from the sensor timestamp to the actual time (by inferring boot time etc.), where different boot times were inferred for different files?

This is the only example I have of a negative interval, where I also have the corresponding .txt files. I do see a few other examples of negative interval times for other users, from earlier in the day before you added code to dump the .txt files, including some cases where early in the day they have a negative time interval and later in the day the negative time interval disappears (e.g., a run earlier in the day lists (01:19:17--00:34:55); in the last column; a run later in the day no longer has that interval and only has ordinary-looking intervals plus (01:19:26--01:19:26)).

I'm running ClassifierLog.py by the script ./mkanomaly.sh in git, which takes all the data from Dropbox for today (but not anything from the previous day) and decrypts it all and then runs ClassifierLog.py on that. Therefore, if it's run early in the day, it might have only a few hours of sensor data (e.g., from 12:01am in the morning until whenever it was run), if that matters.

Any ideas?

Is it possible that getUserFilesByDayAndInstrument() returns files in the wrong order (i.e., not in chronological order by the date in the filename, but in some other order)? If yes, that seems like it could cause dataFilesToDataList() to put the sensor readings in the wrong order. Could that cause things to go awry?

This is really strange; looking at the dump of the results, the interval is oddly changing for one of these users:

('(11:25:57--11:25:58)', 0)
('(11:25:58--11:25:59)', 0)
('(15:44:12--15:44:13)', 0)
('(15:44:13--15:44:14)', 0)
('(09:42:11--09:42:12)', 0)
('(09:42:12--09:42:12)', 0)
('(09:42:39--09:42:39)', 0)

I printed out the order of the filenames in getUserFilesByDayAndInstrument() and the glob library returns it in sorted order, which is then chronological as expected. So, these intervals should be continuous without any of these weird jumps/gaps. My first thought is that something went wrong with the boottime calculation to cause this (and perhaps such that some timestamps were erroneously translated to a different day--I'll add the date as well to the dump log), because the only way for the "Total Time" calculations to be negative is if the interval was somehow formatted as (later date, earlier date).

I'm pulling the latest data now (my computer fell asleep last night and paused the syncing), and will run it locally for these odd instances to see what's going on with the boot times, or if something else is being miscalculated in an intermediate step.