aonez/Keka

AppleArchive support

Closed this issue ยท 22 comments

It seems that Apple has included a command line utility that can create .yaa archives since High Sierra... but there don't seem to be a lot of articles about it online... I tried it on High Sierra and it seems to work well... it's multithreaded, it can use different compression formats (Apple's LZFSE is default, but it can also use LZMA, LZ4, ZLIB or LZVN... or not compress) and it can optionally include extended attributes also.

There doesn't seem to be a way to set a compression level, but you can set a larger block size for better compression.

Usage: yaa command <options>

Commands:
    extract                     extract files from an archive
    list                        list the contents of an archive
    convert                     convert an archive into another
    archive                     archive the contents of a directory
    info                        show archive info
    manifest                    alias for 'archive -manifest'
    index                       alias for 'convert -index'
    manifest-to-bom             convert a YAA manifest to BOM
    verify                      verify dir contents against YAA manifest

Options:
   -v                           increase verbosity level to stderr. Default is silent operation
   -h                           show usage and quit
   -t nWorkerThreads            number of threads to run for compression/decompression, default is 8 on this machine
   -wt nWriterThreads           number of writer threads to run, default is nWorkerThreads ('extract' only)

   -d targetDir                 target directory, default is current directory

   -o outFile                   output file, default is stdout
   -a <algorithm>               compression algorithm for archive payload
   -b <size>                    compression block size for archive payload
                                  'archive' default compression: lzfse 4m
                                  'manifest','index','combine' default compression: lzfse 1m
   -i inFile                    input file, default is stdin
   -ioffset inOffset            bytes to skip at the beginning of inFile, default is 0
   -isize inSize                bytes to read from inFile, default -1 = read the entire file
   -iindex inIndex              input index file, must be a YAA index of the input archive

   -ignore-eperm                ignore EPERM errors when setting files attributes
   -include-path-list listFile  file containing a list of paths to include (one per line, empty lines ignored); this option can be specified multiple times
   -exclude-path-list listFile  file containing a list of paths to exclude (one per line, empty lines ignored); this option can be specified multiple times
   -include-path path           include entry prefix PATH; this option can be specified multiple times
   -exclude-path path           exclude entry prefix PATH; this option can be specified multiple times
   -exclude-name name           exclude entry paths matching NAME (exact match of a path component); this option can be specified multiple times
   -include-type <types>        include entries matching <types> and exclude all other types
   -exclude-type <types>        exclude entries matching <types> and include all other types
                                  default is to include all entry types
   -include-field <fields>      exclude <fields> from archive entries; this option can be specified multiple times
   -exclude-field <fields>      include <fields> from archive entries; this option can be specified multiple times
                                  'archive': typ,pat,lnk,dev are always included; default is to include typ,pat,lnk,dev,uid,gid,mod,flg,mtm,dat,duz
                                  'convert': typ,pat are always included; default is to include typ,pat,lnk,dev,uid,gid,mod,flg,mtm,dat,xat,acl,duz
                                  'list': default is to include all fields
   -manifest                    alias for the following options:
                                  -exclude-field dat
                                  -include-field sh1,cks,siz
                                  -a lzfse -b 1m
   -index                       alias for the following options:
                                  -exclude-field all
                                  -include-field idx
                                  -a lzfse -b 1m


Special option values:
<fields>                        comma separated list of fields, from the following list:
                                  typ     type
                                  pat     entry path
                                  lnk     link path
                                  dev     device id
                                  uid     user id
                                  gid     group id
                                  mod     access mode
                                  flg     flags
                                  mtm     modification time
                                  dat     file data
                                  siz     file size
                                  cks     file data POSIX 1003.2-1992 32 bit CRC
                                  sh1     file data SHA1 digest
                                  sh2     file data SHA256 digest
                                  xat     extended attributes
                                  acl     access control list
                                  duz     file disk usage
                                  idx     entry index in input archive
                                  yaf     stored fields (metadata entry)
                                  attr    alias for uid,gid,mod,flg,mtm fields
                                  all     alias for all fields
<types>                         one or more characters representing entry types, from the following list:
                                  b       block special
                                  c       character special
                                  d       directory
                                  f       regular file
                                  h       hard link
                                  l       symbolic link
                                  m       metadata entry (not a filesystem object)
                                  p       fifo
                                  s       socket
<algorithm>                     compression algorithm, from the following list:
                                  lzma    LZMA preset 6 (xz)
                                  lzfse   LZFSE
                                  lz4     raw LZ4 default level
                                  raw     no compression
                                  zlib    raw ZLIB level 5
                                  lzvn    LZVN
<size>                          size in bytes, optional suffixes b,k,m,g are accepted

I created an app that manages this, it runs headless and uses the progress circle on the filename. I sent it to @aonez privately some years ago now for consideration for inclusion in Keka. Even before 2019. I called it Quarkive.

Since then he has implemented the progress circle on filename, somewhat headless operation, but still no support for .yaa archive format

I think .yaa performance/power/battery benefits will be even more pronounced on ARM/M1, as it seems yaa was written for that first (iPhone) and backported to Intel.

This feature could be a huge PR opportunity for Keka, and a way for it to forge a new position at the forefront of archiving, with this "new" archive format for all to use. Set a new standard.

So, my point is, I'd like this too.

One other thing I wanted to look into, but never did, was the possibility of using yaa in zip mode to create zips and then replace the yaa header with a zip compatible header. So yaa could be used to make zips faster than any zip alternative. Just an idea.

Any plans @aonez ?

aonez commented

Let me check that one again @gingerbeardman

aonez commented

I'm doing a quick test and can definitely be implemented. I'm unable to archive/extract metadata, not even stating this:

-include-field xat -include-field attr -include-field yaf

Were any of you able to do this?

So yaa has been renamed aa, at least in the latest Big Sur, and the man page clarifies that the name stands for Apple Archive.

For the metadata you list, I just did this:

aa archive -include-field xat,attr,yaf -d temp -o test.aa
and
aa extract -i test.aa

xattred.app shows the same metadata before and after:

Screen shot 2021-10-05 at 21 55 11

aonez commented

So yaa has been renamed aa, at least in the latest Big Sur, and the man page clarifies that the name stands for Apple Archive.

Yes, they keept a yaa alias. Also there's some documentation now and API's to use the format:
https://developer.apple.com/documentation/applearchive

I've looked at the extensions and it used to be yaa and now it should be aar standing for Apple ARchive, I imagine. In Big Sur both extensions are recognized and the bundled archiver extracts them.

aa archive -include-field xat,attr,yaf -d temp -o test.aa

This still does not work for me...

aone@aONe-M1 ~ % xattr aaa/test.png 
com.apple.FinderInfo
com.apple.lastuseddate#PS
com.apple.macl
com.apple.metadata:_kMDItemUserTags
com.apple.metadata:kMDItemIsScreenCapture
com.apple.metadata:kMDItemScreenCaptureGlobalRect
com.apple.metadata:kMDItemScreenCaptureType
aone@aONe-M1 ~ % xattr aaa_/test.png 
com.apple.macl
aone@aONe-M1 ~ % 
aonez commented

Here a test release:

https://github.com/aonez/Keka/releases/download/dev-test-builds/Keka-1.2.19-dev.r4698.7z

It has full extraction support (not encrypted files). Compression does not yet have any option but works too, with the default settings.

I've seen there's encryption support in Monterey, will investigate more on that one.

Problem with my first try.

Screen shot 2021-10-25 at 13 06 51

Screen shot 2021-10-25 at 13 05 55

OS: Version 11.6 (Build 20G165)
Keka: v1.2.19-r4698 (WEB) (Sandboxed) (en-GB)
Binary used: keka7z
Arguments: (
    a,
    "-t7z",
    "-mx5",
    "-ms=on",
    "-xr!.DS_Store",
    "-xr!.localized",
    "-xr!._*",
    "-xr!.FBC*",
    "-xr!.Spotlight-V100",
    "-xr!.Trash",
    "-xr!.Trashes",
    "-xr!.background",
    "-xr!.TemporaryItems",
    "-xr!.fseventsd",
    "-xr!.com.apple.timemachine.*",
    "-xr!.VolumeIcon.icns",
    "/Volumes/External/Users/matt/Downloads/2021-10-25/VOL6.7z",
    "VOL6.cue",
    "VOL6.bin"
)
Error code 335
aonez commented

You were with AAR format chosen and it tried to create a 7Z?

Oh, I used the Service Context Menu to "Compress with Keka". I thought it compressed with current settings?

Now I see that I was only viewing AAR settings, but default was still set to 7z.

I have no idea why 7z failed.

aonez commented

Matt probably you're mixing the stable version and the 4698 build, can be? I have no issue setting that on the dev build.

Yes, you're right! Need more coffee. โ˜•๏ธ

So the only real issue is that installing the beta my p7zip is broken. I needed to dequarantine the app like in: #208 (comment)

Screen shot 2021-10-26 at 11 05 25

Do you still have the shell script that I sent you ages ago?

I have workaround for this issue.

aonez commented

Do you still have the shell script that I sent you ages ago?

I recall seeing a screencast of the script running but no code. Something with aliases or similar?

In Monterey there's an append option so that can be used. But anyway maybe the best option will be to create a custom compressor, given that we now have an API.

Kind of, I started with aliases but there were issues.

I eventually settled on cloning the files (shares the same data but has new filesystem entry) using cp -ac into a temp folder and then archiving and removing that. I guess cloning is only supported on APFS? So only OS X version 10.12 and newer.

But yeah, I think you have the best idea to use the API!

aonez commented

See #687

What is remaining to do for Apple Archive?

aonez commented

I'll will add support for AAR on iOS next and that will be the foundation of the enhanced support for macOS (compress multiple files. I need to create a custom compressor to not rely on aa/yaa.

The idea is to make it compatible with pipes so the contents can be tarballed.

VaslD commented
Screen shot 2021-10-26 at 11 05 25

Do you still have the shell script that I sent you ages ago?

I have workaround for this issue.

According to Apple's documentation this is no longer the case using AppleArchive framework. See Compressing and Saving a String to the File System (which differs from Compressing Single Files by wrapping the compression stream in an encoder stream just like when compressing a directory, but with a manual header to provide filesystem information instead of putting raw LZFSE in the archive).

The example only shows when input is Data (some ContigousBytes), in case of file-on-disk you can call ArchiveHeader.init(keySet:directory:path:flags:) (https://developer.apple.com/documentation/applearchive/archiveheader/3589205-init) to copy original filesystem information and make the process less manual.

I made a snippet to do what aa does with AppleArchive's Swift API: https://gist.github.com/VaslD/1eaee19546112b04052cfda22a3cb05d. Single file is compressed as-is without additional parent folder.

P.S.: If I wasn't being clear (the snippet doesn't exactly show this), you can repeatedly push new headers into the archive to achieve what didn't work in the screenshot.

aonez commented

Just released a beta version that support multiple files compression (macOS 12+): Keka v1.2.62-beta.1

Also support encryption and decryption using a password, again macOS 12+.