BurntSushi/same-file

Support `ino` and `dev` on windows

Closed this issue · 5 comments

Both the inode and device numbers are u64 on unix like systems.

On windows, you can get the BY_HANDLE_FILE_INFORMATION structure which returns:

  • dwVolumeSerialNumber: The dword of the volume (device)
  • nFileIndexHigh: The high-order part of a unique identifier that is associated with a file.
  • nFileIndexLow: The low-order part of a unique identifier that is associated with a file.

It would be nice, if possible, if same-file could map:

  • dwVolumeSerialNumber todev()
  • nFileIndexHigh << 32 | nFileIndexLow to ino()

This would mean that you could have an analogue of inodes on windows with this library. Alternatively, maybe exposing these as their own functions to ensure they are explicitly windows only values.

I don't think it makes sense to provide dev and ino on Windows since those are Unix names. If anything, we should probably take a page from std and provide platform specific methods. For example: https://doc.rust-lang.org/std/os/windows/fs/trait.MetadataExt.html --- std doesn't expose dwVolumeSerialNumber or nFileIndex{High,Low}, but perhaps we could invent our own methods for that.

@cetra3 Out of curiosity, what is your use case for this?

I think platform specific may be a more correct path.

The immediate need was I was looking at possible solutions for my mdcollate app. It basically will collate a directory of markdown files, but I don't want to add the same file multiple times. If I can have a set of files based on inode I can prevent duplicate portions and circular links. Right now it's using pathbufs, canocilization and string manipulation which isn't ideal.

Further to this, and more of a future goal, I'd like to be able to detect if a file has moved place easily even if there isn't active event notification. I think the same thing could be accomplished with checksums, but being able to identify through inodes alone would be a great shortcut. I'm not sure about the permanacy of inodes though, and whether certain actions would cause them to change.

@cetra3 Hmm the way you're describing your problem has me a little worried! Did you know that the entire reason why this crate exists is precisely because of the impermanency of things like inode identifiers? :-) Basically, the only way to make sure your file comparisons are reasonably accurate is to have two open file handles to the things you're comparing, which ensures that the underlying identifiers (whether it's inode on Linux or nFileIndex{High,Low} on Windows) won't be reused for something else.

Of course, you may elect to ignore correctness under the assumption that false positive comparisons (two files comparing equal even though they aren't) are either rare or something you can withstand. You can see an example of that here, and note that it is followed up with a "proper" Handle comparison.

Note that I've also seen some arguments from folks that claim path canonicalization is actually the more reliable way to do this, but I haven't gone down that path too far.

To clarify the second use case is not related to mdcollate but another project regarding synchronisation.

I was thinking more along the lines of a bloom filter, where you can use it as a first pass to possibly save on calculations. I think that's in the same vein as what you linked.

It seems this ticket has gone stale and hasn't otherwise been demand for this. I'd be happy to revisit this if someone wanted to submit a PR.