FR: allow `convert.times = TRUE` for `read.sav`
Closed this issue · 10 comments
Currently the option convert.dates = TRUE
forgets the times in what would be date-time data (with haven:read_sav
I get POSIXct
). I see that varmat=22
in my dataset.
Hi @iago-pssjd , thanks for the report. Follow up question
- Do you mean vectors only having hour minutes seconds (as in hh:mm:ss)? As written without day month year? Those are not converted because R does not provide a default vector type for this. There is the hms package you can use for manual conversion, but it relies on tidyverse dependencies and therefore isn't added to this package.
- Or do you have datetime vectors (yy-mm-dd hh:mm:ss) that are converted to dates? Not sure what exactly you mean. Maybe you could add a screenshot of what you are looking for?
Hi @JanMarvin , thanks for the answer.
I mean the second option, datetime vectors. Indeed, the str
of the haven
-readed data is
POSIXct[1:648], format: "2021-06-10 13:18:47" "2021-06-11 11:51:03" "2021-06-11 16:47:28" "2021-06-11 21:03:23" "2021-06-12 12:10:05" ...
while for the read.sav
with convert.dates = TRUE
is
Date[1:648], format: "2021-06-10" "2021-06-11" "2021-06-11" "2021-06-11" "2021-06-12" "2021-06-13" "2021-06-14" "2021-06-14" ...
and with convert.dates=FALSE
is
num [1:648] 1.38e+10 1.38e+10 1.38e+10 1.38e+10 1.38e+10 ...
`
Oh sure that should be added. Should be done with convert.date
as well. Will have to look into which varmats indicate datetime. Thought that it was already available, but maybe I was lazy because of never having dealt with datetimes until a few years ago (those were the days ...).
If you want to, please feel free to open a pull request.
FYI the relevant code is here:
Lines 300 to 320 in 17a9244
Some or all of these varmats can be datetime. If it's impossible to verify which type contains time, additional checks could be added if the variable is of type integer (date) or numeric (datetime). Or as alternative, provide additional options to always create datetime or date.
Thanks! @JanMarvin .
Actually I do not understand the meaning of the varmat[,6]
possible values. If I'm not wrong, it seems that you take them from
Line 237 in 17a9244
Line 255 in 17a9244
and then from
Lines 23 to 32 in 17a9244
but I cannot understand the values returned by
swap_endian
.Beyond, if
If it's impossible to verify which type contains time
how other checks could be done?
Hi @iago-pssjd , no need to look at the Rcpp code (the snippet you've picked simply reads the bit from the sav binary and converts it to your system endianes if required). We have to identify, which of these varmat[, 6] %in% c(20 , 22, 23, 24, 38, 39)
are datetime formats. They are somehow stored in SPSS with some format. They indicate how SPSS itself shows them.
How would I approach this: Maybe start with some initial assumption. Maybe varmat 20 is just the year, 22 is a year+month, 23 is year month in abbreviation, 24 is year month day, 3x... could be datetime. That's just something we have to try out. You mentioned that 22 looks like datetime, that's a first step. Ideally we have a file with all the different formats and can open it in SPSS/PSPP to compare their dates to ours. SPSS access is a bit of the problem for me. I'd have to ask some university guys if they can run a sample file for me, but PSPP is available. Maybe it's already documented in the PSPP docs.
I guess one other issue was that SPSS had a datetime variable with a ymd format.
PSPP documentation has the following: Variable-Record
20 | DATE |
---|---|
21 | TIME |
22 | DATETIME |
23 | ADATE |
24 | JDATE |
25 | DTIME |
26 | WKDAY |
27 | MONTH |
28 | MOYR |
29 | QYR |
30 | WKYR |
31 | PCT |
32 | DOT |
33 | CCA |
34 | CCB |
35 | CCC |
36 | CCD |
37 | CCE |
38 | EDATE |
39 | SDATE |
40 | MTIME |
41 | YMDHMS |
Hi!,
I have the same issue with SPSS. Actually I should have SPSS at my work place, but I didn't get it to work with the last dataset I had to analyse. I will check ASAP these values with examples.
Thanks!
I came up with the sps file below using this as reference: https://libguides.library.kent.edu/SPSS/DatesTime
data list list /
d1 (DATE9)
d2 (DATE11)
a1 (ADATE8)
a2 (ADATE10)
e1 (EDATE8)
e2 (EDATE10)
j1 (JDATE5)
j2 (JDATE7)
s1 (SDATE8)
s2 (SDATE10)
q1 (QYR6)
q2 (QYR8)
m1 (MOYR6)
m2 (MOYR8)
w1 (WKYR8)
w2 (WKYR10)
dt1 (DATETIME17)
dt2 (DATETIME20)
dt3 (DATETIME23.2)
y1 (YMDHMS16)
y2 (YMDHMS19)
y3 (YMDHMS19) /* 19.2 .
w3 (WKDAY3)
w4 (WKDAY9)
m3 (MONTH3)
m4 (MONTH9).
begin data.
"31-JAN-13", "31-JAN-2013", "01/31/13", "01/31/2013", "31.01.13", "31.01.2013", "13031", "2013031", "13/01/31", "2013/01/31", "1 Q 13", "1 Q 2013", "JAN 13", "JAN 2013", "5 WK 13", "5 WK 2013", "31-JAN-2013 01:02", "31-JAN-2013 01:02:33", "31-JAN-2013 01:02:33.72", "2013-01-31 1:02", "2013-01-31 1:02:33", "2013-01-31 1:02:33.72", "THU", "THURSDAY", "JAN", "JANUARY"
end data.
save outfile = "/tmp/datetimes.sav" .
I've seen just now your comment and PR. Great!
Thank you!