tronovav/geoip2-update

File operations are not atomic

oschwald opened this issue · 2 comments

Currently, the script deletes the directory and then extracts the new directory. If this was done while the application was running, it could lead to errors in the window of time between the deletion and when the files are fully extracted.

The official geoipupdate client avoids this issue by atomically replacing the file. It does this by extracting the new file to a temp file on the file system and then moving the new file over the old file. On unix systems, a move of a file on the file system is an atomic operation.

This is complicated a bit by the fact that this library is operating on directories rather than individual files as you cannot move a directory over another directory. If the library only handled MMDB files, I would recommend just operating on those individual files. However, the CSV files present a somewhat more difficult case as you want to replace them all at once. This could be accomplished by using a symlink to the actual directory and atomically replacing that.

This article covers this at a bit more depth. I only glanced at it, but the information seems correct.

In version 2.1.0, we implemented the atomicity of the update.

The atomicity of the update operation is implemented at the level of the database files.

The structure of the mmdb and csv databases is different. Mmdb databases consist of a single database file. Thus, when updating the mmdb databases, the operation is completely atomic and errors associated with the short-term absence of the mmdb file during the database update are excluded. Since the files are atomically replaced with new ones.

CSV databases consist of multiple files. When upgrading versions of the CSV database, each CSV file is also replaced atomically, and there is no chance of a file missing during the upgrade.

The point is that tronovav / geoip2-update is a cross-platform php library for updating GeoIP databases. Therefore, using symlink to implement atomicity will deprive developers of updating databases on local development loops and transferring projects to other platforms entirely as they are.

The official geoipupdate client is tied to one specific server and cannot be transferred in place with a php project to other servers. Also, installing a geoipupdate client requires a lot of authority to the server user.

Undoubtedly, each client has its own pros and cons. The main thing is that there is a choice for different occasions.

At the moment, it seems to us that the most flexible and universal option for updating geoip2 is implemented from the php application itself.

Although, of course, we continue to consider any solutions that will improve the use.