jgm/zip-archive

Correct way to specify last modified date?

Closed this issue · 13 comments

Hello,

I'm using zip-archive version 0.4.3.

I have this pseudocode:

import "zip-archive" Codec.Archive.Zip qualified as ZArch
import Data.Time.Clock.System (getSystemTime, systemSeconds)

systime <- liftBase getSystemTime

let lastModified = fromIntegral $ systemSeconds systime
let e = ZArch.toEntry fpath lastModified bscContents 

ZArch.fromArchive (ZArch.addEntryToArchive e ZArch.emptyArchive)

So basically this should create a zip file with some entry where last modified is current local time. However, from what I see in my archive viewer, the last modified time is set to UTC.

To be more precise:

st <- getSystemTime
systemToUTCTime st

returns 2024-04-04 10:05:34 UTC currently. The local time is 12:05:34. So the st value is correct. However, the zip file shows 10:05:34 LOCAL time.

I see that the C library differentiates between zip_file_set_mtime and zip_file_set_dostime (https://libzip.org/documentation/zip_file_set_mtime.html):

Following historical practice, the zip_file_set_mtime() function translates the time from the zip archive into the local time zone. If you want to avoid this, use the zip_file_set_dostime() function instead.

Since zip-archive is a native Haskell library, I'm guessing that there are 2 different fields for this? Maybe we set the wrong one?

jgm commented

Here's what the Haddocks say:

               , eLastModified            :: !Integer             -- ^ Modification time (seconds since unix epoch)

We just have this one field. toEntry also uses as a parameter "seconds since unix epoch."
readEntry uses

  modEpochTime <- (floor . utcTimeToPOSIXSeconds) <$> getModificationTime path

Can you see a problem here?

jgm commented

PS getModificationTime returns a UTC time, according to its documentation.

jgm commented

I guess I'd assumed that "seconds since the unix epoch" means "relative to UTC time." Could it be that it is interpreted relative to local time?

jgm commented

I might have just used UTC because this library exports pure functions (no IO) -- hence no way to get the locale's time zone. We could, however, make it a parameter on one of the exported functions.

https://en.wikipedia.org/wiki/Unix_time

Unix time is currently defined as the number of non-leap seconds which have passed since 00:00:00 UTC on Thursday, 1 January 1970, which is referred to as the Unix epoch.

https://www.epochconverter.com/clock this seems to be correct with that definition and provides the exact time as returned by getSystemTime.

If we sticked with this definition, then a number 1712319469 should be interpreted as seconds since 1970-01-01 00:00:00 UTC. I guess that sticking seconds directly into eLastModified as done above would be correct according to the "unix epoch" definition. Also, if eLastModified wasn't relative to UTC time, we couldn't reliably send the zip file to another time zone as eLastModified doesn't encode the local timezone.

So at least in theory it all seems to be done correctly. But I don't understand why ark or other program shows this as -2h from current date...

https://www.ghisler.ch/board/viewtopic.php?t=80315

ZIP files store file times as local time, while the NTFS file system stores them as UTC (universal time). So when switching from/to daylight saving time, either the first or the second will change by one hour.

ZIP stores the time as local time only (no timezone).
If you take a photo of sunrise at 6.00 UTC+6 in the morning, pack it as zip and you unpack it at a city at timezone UTC-6 you will still see 6.00 in the morning.
You see:

  1. The sunrise photo is taken in the morning of the origin city.
  2. You have to calculate your own, what time it was at your own city.

So, apparently eLastModified accepts unix epoch time adjusted for local time zone?

Anyways, the fact that zipping with zip-archive is a pure function is the reason I used this library in the first place :)

jgm commented

Actually, the line I quoted above comes from readEntry which is in IO.
So here we could get the locale time zone.

jgm commented

OK, I think I've fixed this. Can you test as well?

The fix only affects readEntry, which is the only part of this that actually looks at the modification time of a file.

jgm commented

If it looks okay, we can do a new release.

Yes, this might be a solution. However, I still use the pure toEntry because I don't read files but generate them programmatically. But the getTimeZone * 60 seems to be the way to solve this.

jgm commented

If you use pure toEntry, then you are specifying the modification time yourself. So you just need to be aware that it expects a time-zone relative unix epoch time. This is not a problem with the library itself, unless I'm missing something.