kyz/libmspack

cabextract: Writing into symlinks

wereii opened this issue ยท 11 comments

The problem
Let's say we have a CAB file that contains hello.dll and a symlink with the same name to some other file in the current directory.
With this when you extract the CAB the symlink is followed and it's target file is overwritten.

Why is it problem
AFAIK when writing files, following file symlinks is generally unsafe (?), at least within context of archive extraction (again, afaik)
EDIT: Tried what other archive tools do

  • zip (Info-ZIP) asks what to do when it finds a symlink (with same name as to-be extracted file)
  • tar -xf ignores the symlink (eg. unlinks and writes extracted file)

I've made a super simple patch in my fork https://github.com/wereii/libmspack/commits/master that adds build option for opting into symlink checks before a file is opened for writing (and unlinking/removing possible link before the fopen happens, so the file is written as is and not into the symlinked file, nothing more).

Though the patch is just a simple thing that works for me and

  1. it's not thorough, I've simply put the un/link check before opening file for writing and excluding special files
  2. there could be bunch of ways to resolve this (fail on symlinks, warn, ask, ... and probably be configurable as cli argument).

Either way this behavior requires some more though so here we are.

kyz commented

Thank you, this is a good issue to bring up.

There are two things to consider here; first is about following symlinks or not, the second is overwriting or not.

cabextract currently has no options for controlling overwrite behaviour; it would great to add some.

I'm not sure whether following symlinks is bad or not. As cabinet files cannot contain symlinks, it's not a security issue but a user choice. Only the user can introduce symlinks to cabextract's extraction path. cabextract follows default unix semantics; overwriting a symlinked file writes to its destination. This might even be desirable to the user.

Looking further into tar and zip:

  • unzip uses stat() to determine if a path already exists, and if it's going to overwrite, it always calls unlink() before fopen(), thus never writing into a symlink
  • tar uses open(, O_NOFOLLOW) to avoid writing to a symlink by default, unless the -h --dereference option is used. It has a separate -U --unlink-first option

tar's approach is simpler but requires a system that conforms to POSIX.2008, zip's approach is more portable.

What should cabextract do? Should it maintain its existing behaviour (is someone relying on its current overwrite behaviour?), or should it move towards being more like unzip/tar behaviour?

My thoughts are to be more like unzip/tar, but the user need only add one flag to get the old behaviour back:

  • stat() each output file to determine if it exists or not. If it exists...
  • if -i --interactive specified, ask the user on stderr/stdin what to do: overwrite or skip. Otherwise...
  • If -n --no-overwrite specified, always skip; otherwise always overwrite
  • unlink() before overwriting, unless -k --keep-symlinks is specified

The option help would be:

-i  --interactive    prompt whether to overwrite existing files
-n  --no-overwrite   don't overwrite existing files 
-k  --keep-symlinks  if files to overwrite are symlinks, follow them

What do you think?

Thanks for raising the issue; if you'd like to, you can continue to develop this feature and raise a pull request, but if you don't want to, I'm also OK taking this as a feature request and developing it myself.

This seems sane to me.
I will do PR but your oversight would be greatly appreciated as I am not really sure with cabextract (and your current build system :P )

To summarize:
The default is to ignore anything with a same name that already exists, ignoring symlinks and overwriting files.

-i - ask y/N if it should overwrite
-n - never overwrites, though does it silently skip the file or stop/error out if it finds one? I think it should skip and be verbose about it, unless --quiet
-k - Only thing, should it be verbose about smylink being followed? something along following symlink ..., unless --quiet ?

kyz commented

Your summary is correct.

For -n, it should print " skipping %s" instead of " extracting %s", and continue to the next file.

For -k, I don't see a reason to write code specifically to discover whether a symlink is being followed. The user has to put them there and choose the -k option explicitly, so they already know. If following symlinks were bad, printing a warning message as you do it is too late.

My thoughts would be to add a function call before extracting, e.g. can_write(), which returns 1 if extraction should occur (either writing or overwriting the path), or 0 if extraction should be skipped, e.g.

    else if (can_write(name)) {
      /* extracting to a regular file */
      if (!args.quiet) printf("  extracting %s\n", name);
      ...

can_write() would have logic like this:

    stat(name)
    if stat() fails:
        return 1 (overwrite)

    if "-i" option set, ask user to choose overwrite/skip
    else if "-n" option set, choose to skip
    else choose to overwrite

    if chose to overwrite and "-k" option NOT set:
       unlink(name)
       if unlink() fails, print error and return 0 (skip)

    return choice of overwrite (1) / skip (0)

Is there any reason for not merging this #43 PR? This issue breaks installation of any CAB-based package in winetricks/protontricks, ie. most packages.

kyz commented

The reason it's not merged is that it's incomplete; there is nothing wrong with what's written so far, but it only goes so far as to add a test and the command-line parsing. I'll take a look into finishing it

Thank you, much appreciated. The newer versions of Proton create only symlinks in SYSTEM32/SYSWOW64, which breaks attempts to fix compatibility issues.

The reason it's not merged is that it's incomplete; there is nothing wrong with what's written so far, but it only goes so far as to add a test and the command-line parsing. I'll take a look into finishing it

Any updates on getting this flushed out and finished?

kyz commented

I've implemented the remaining parts, and would appreciate testing of the new functionality, if you have time.

Hi @kyz, I see this has made it into a release. A couple questions:

  1. I see that there is now a --keep-symlinks option, which is great. Is there a way to tell cabextract that you don't want to respect the symlink, and that it should be removed before extracting the target file instead?

  2. When this was first added, it seems there was a regression (introduced by 89ac9df, fixed by cc09dd3). Since it was fixed, cabextract seems okay, but I just wanted to confirm that if a target file is NOT a symlink, it's not expected to need --keep-symlinks?

kyz commented

Hi @austin987

  1. It is now cabextract's default behaviour to do that, no command line option needed
  2. That's correct. cabextract 1.10 had a bug where it would not create directories required for file extraction. There is now an official release of cabextract 1.11 with the fix

Also, to clarify further: if you run cabextract -d /my/own/path mycabfile.cab and the file being extracted is called a/b/c/d.txt, then the file is ultimately extracted to /my/own/path/a/b/c/d.txt; cabextract will NOT replace symlinks in /my/own/path, because you specified that on the command line, but WILL replace any symlinks in the a/b/c/d part, so if any of these were symlinks, they'd be overwritten:

  • /my/own/path/a
  • /my/own/path/a/b
  • /my/own/path/a/b/c
  • /my/own/path/a/b/c/d.txt

Perfect, thanks!