jgm/zip-archive

UnsafePath exception is raised when a file is encountered that starts with .

hanjoosten opened this issue · 18 comments

I use zip-archive as a library in ampersand. We need to extract the file https://github.com/AmpersandTarski/Prototype/archive/v1.1.1.zip programatticaly. Since we upgraded to zip-archive v0.4, we get an error message:

Generating prototype...
Error encountered during deployment of prototype framework:
  Failed to extract the archive found at https://github.com/AmpersandTarski/Prototype/archive/v1.1.1.zip
UnsafePath "C:\\Bitnami\\wampstack-7.1.26-0\\apache2\\htdocs\\AmpersandPrototypes\\Arbeidsduur.proto\\.bowerrc"

I think it is totally valid to extract a zipfile that contains only files and directories that are in scope, such as .bowerrc

I expected that the extraction of the archive would go without any problem. Has something gone wrong with the implementation of Issue #50 ?

jgm commented

This could be Windows-specific. I just tried using zip-archive to extract this zip on my (macos) machine, and had no difficulties. Unfortunately, it isn't easy for me to test on Windows. Maybe you can do some poking around and see why this exception is being raised?
The relevant code is just

  let path = case [d | OptDestination d <- opts] of
                  (x:_) -> x </> eRelativePath entry
                  _     -> eRelativePath entry
  absPath <- makeAbsolute path
  curDir <- getCurrentDirectory
  let isUnsafePath = ".." `isInfixOf` absPath ||
                     not (curDir `isPrefixOf` absPath)
  when isUnsafePath $ E.throwIO $ UnsafePath path

So you could do some tracing to see what absPath and curDir are in your case.

Ah. This is where the problem is. The problem is in the use of curDir.

In my usecase, I have a destination that has no relation with curDir. So I get isUnsafePath == True for all my entries.

I believe the test to see in an entry has an unsafe path should be done on the eRelativePath of the entry before the destination is prepended to it. If that path is an absolute path OR that path contains "..", then it could be mallicious. I do not see the relation with curDir.

jgm commented

Ah yes, curDir is definitely wrong here! Sorry about that.

The test you suggest is the one I had originally suggested in #50. @blender suggested instead

Make the path absolute (resolving symlinks as well) and check that it does not leave the root directory of the archive.

and this was my attempt to do that.

Questions:

  1. Should we really worry about resolving symlinks? To me this seems unnecessary.
  2. Should we raise the exception for all absolute paths? Or should we allow absolute paths that stay within the working directory (as currently)?

@blender I'd be interested in your view on this too, since you raised the original issue.

Well, the good news is this isn't windows specific. Thanks for looking into this!

@jgm my memory is a little rusty but I think the path to check is this one:
https://github.com/jgm/zip-archive/blob/master/src/Codec/Archive/Zip.hs#L145

This path (once resolved) must not leave the root of the archive.

If that is sanitized then there shouldn't be any reason to worry about symlinks that anyways cannot leave the root: https://github.com/jgm/zip-archive/blob/master/src/Codec/Archive/Zip.hs#L404

All paths should be resolved relative to the root for the archive when recorded and when extracted.
Even if one zips /usr/local/bin this should be unzipped to <output-destination>/usr/local/bin.

So:

  1. Probably no need to worry if eRelativePath is sanitized
  2. I would raise an exception for any absolute path encountered. I wound't let absolute path get by normalizePath
jgm commented

So how about if we just check

not (".." `isInfixOf` eRelativePath entry || isAbsolute (eRelativePath entry))

@jgm I think that would work but unfortunately Hello..ciao is a valid directory name so the first check would give a false positive.

how about splitting eRelativePath entry on / and checking that the list doens't contain .. ? it's till naive as more elaborate paths might actually not leave the root of the archive once properly resolved.

I have looked at canonicalizePathbut I am not sure about this bit:

Canonicalization is performed on the longest prefix of the path that points to an existing file or directory. The remaining portion of the path that does not point to an existing file or directory will still be normalized, but case canonicalization and indirection removal are skipped as they are impossible to do on a nonexistent path.

You could use splitDirectories to split the paths in the archives. Then you should be fine a name like Hello..ciao.

If you need someone to test your solution on a windows machine, just let me know.

-- Copied from MissingH
subIndex :: Eq a => [a] -> [a] -> Maybe Int
subIndex substr str = findIndex (isPrefixOf substr) (tails str)

-- Actual resolve
resolve path = case subIndex [".."] path of
                    Just 0 -> undefined -- escapes!
                    Just index -> resolve $ (take (index-1) path) ++ (drop (index+1) path) 
                    Nothing -> path
Prelude Data.List> resolve ["1", "2", "3", ".."]
["1","2"]
Prelude Data.List> resolve ["1", "2", "3", "..", ".."]
["1"]
Prelude Data.List> resolve ["1", "2", "3", "..", "..", ".."]
[]
Prelude Data.List> resolve ["1", "2", "3", "..", "..", "..", ".."]
*** Exception: Prelude.undefined
CallStack (from HasCallStack):
  error, called at libraries/base/GHC/Err.hs:79:14 in base:GHC.Err
  undefined, called at <interactive>:205:31 in interactive:Ghci34
Prelude Data.List>

This however doens't cover stuff like ["1", "2", "3", "..", "..", "..", "..", "1"] which doesn't actually escape

I'd argue that ["1", "2", "3", "..", "..", "..", "..", "1"] does actually escape. It seems valid to me to throw an error here.

jgm commented

Yes, I think splitDirectories and check for a .. component is the way to go here.

jgm commented

Have a look at the change I just pushed.

An alternative approach would be to always append a destination path to the relative path, and if no destination path is specified, use the working directory. This should have the same effect in practice.

Great! Thanks for the fix.

@jgm I think it should work until someone complaints that A/../A is a valid path in an archive :)

jgm commented

Hm, yes. Well, I wouldn't object to adding some logic to simplify A/../A to A etc.
Seems like something simple like this could work:

simplifyPath :: FilePath -> IO FilePath
simplifyPath fp = maybe (throwIO (UnsafePath fp)) return (resolve (splitDirectories fp))

resolve :: [String] -> Maybe [String]
resolve [] = Just []
resolve ("..":_) = Nothing
resolve (".":xs) = resolve xs
resolve (x:"..":xs) = resolve xs
resolve (x:xs) = (x:) <$> resolve xs

This wouldn't allow

"foo" </> "bar" </> ".." </> ".." </> "baz" 

while it should.

Maybe not very elegant, but something like this would do the job:

simplifyPath :: FilePath -> IO FilePath
simplifyPath fp = maybe (throwIO (UnsafePath fp)) $ return (resolve  (splitDirectories fp))

resolve :: [String] -> Maybe [String]
resolve str = resolve' (elemIndices ".." str') str'
  where str' = filter (== ".") str 
        resolve' [] rest = Just rest
        resolve' (0:_) rest = Nothing
        resolve' (n:ns) rest = (foo <>) <$> resolve' (elemIndices ".." rest') rest'
                                where (foo,bar) = splitAt (n-1) rest
                                      rest' = drop 2 bar 
jgm commented

I like this one because it's a bit more obvious how it works:

import System.FilePath
import Control.Monad
import Codec.Archive.Zip
import Control.Exception

simplifyPath :: FilePath -> IO FilePath
simplifyPath fp =
  maybe (throwIO (UnsafePath fp)) (return . joinPath)
    (resolve . splitDirectories $ fp)

resolve :: Monad m => [String] -> m [String]
resolve =
  fmap reverse . foldl go (return [])
  where
  go acc x = do
    xs <- acc
    case x of
      "."  -> return xs
      ".." -> case xs of
                []     -> fail "outside of root path"
                (_:ys) -> return ys
      _    -> return (x:xs)
jgm commented

I've pushed a change, further testing welcome, and we really need to add some automated tests to ensure that the exception is raised when it should be.