README for pdfsizeopt ^^^^^^^^^^^^^^^^^^^^^ pdfsizeopt is a program for converting large PDF files to small ones. More specifically, pdfsizeopt is a free, cross-platform command-line application (for Linux, macOS, Windows and Unix) and a collection of best practices to optimize the size of PDF files, with focus on PDFs created from TeX and LaTeX documents. pdfsizeopt is written in Python, so it is a bit slow, but it offloads some of the heavy work to its faster (C, C++ and Java) dependencies. pdfsizeopt was developed on a Linux system, and it depends on existing tools such as Python 2.4, Ghostscript 8.50, jbig2enc (optional), sam2p, pngtopnm, pngout (optional), and the Multivalent PDF compressor (optional) written in Java. Doesn't pdfsizeopt work with your PDF? Report the issue here: https://github.com/pts/pdfsizeopt/issues Send donations to the author of pdfsizeopt: https://flattr.com/submit/auto?user_id=pts&url=https://github.com/pts/pdfsizeopt Installation instructions and usage on Linux ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ There is no installer, you need to run some commands in the command line to download and install. pdfsizeopt is a command-line only application, there is no GUI. To install pdfsizeopt on a Linux system (with architecture i386 or amd64), open a terminal window and run these commands (without the leading `$'): $ mkdir ~/pdfsizeopt $ cd ~/pdfsizeopt $ wget -O pdfsizeopt_libexec_linux.tar.gz https://github.com/pts/pdfsizeopt/releases/download/2017-01-24/pdfsizeopt_libexec_linux-v3.tar.gz $ tar xzvf pdfsizeopt_libexec_linux.tar.gz $ rm -f pdfsizeopt_libexec_linux.tar.gz $ wget -O pdfsizeopt.single https://raw.githubusercontent.com/pts/pdfsizeopt/master/pdfsizeopt.single $ chmod +x pdfsizeopt.single $ ln -s pdfsizeopt.single pdfsizeopt To optimize a PDF, run the following command: ~/pdfsizeopt/pdfsizeopt input.pdf output.pdf If the input PDF has many images or large images, pdfsizeopt can be very slow. You can speed it up by disabling pngout, the slowest image optimization method, like this: ~/pdfsizeopt/pdfsizeopt --use-pngout=no input.pdf output.pdf pdfsizeopt creates lots of temporary files (psotmp.*) in the output directory, but it also cleans up after itself. It's possible to optimize a PDF outside the current directory. To do that, specify the pathname (including the directory name) in the command-line. Please note that the commands above download all dependencies (including Python and Ghostscript) as well. It's possible to install some of the dependencies with your package manager, but these steps are considered alternative and more complicated, and thus are not covered here. Please note that pdfsizeopt works perfectly on any x86 and amd64 Linux system. There is no restriction on the libc, Linux distribution etc. because pdfsizeopt uses only its statically linked x86 executables, and it doesn't use any external commands (other than pdfsizeopt, pdfsizeopt.single and pdfsizeopt_libexec/*) on the system. pdfsizeopt also works perfectly on x86 FreeBSD systems with the Linux emulation layer enabled. To avoid typing ~/pdfsizeopt/pdfsizeopt, add "$HOME/pdfsizeopt" to your PATH (probably in your ~/.bashrc), open a new terminal window, and the command pdfsizeopt will work from any directory. You can also put pdfsizeopt to a directory other than ~/pdfsizeopt , as you like. Additionally, you can install some extra image imptimizers (see more in the ``Image optimizers'' section below): $ cd ~/pdfsizeopt $ wget -O pdfsizeopt_libexec_extraimgopt_linux-v3.tar.gz https://github.com/pts/pdfsizeopt/releases/download/2017-01-24/pdfsizeopt_libexec_extraimgopt_linux-v3.tar.gz $ tar xzvf pdfsizeopt_libexec_extraimgopt_linux-v3.tar.gz $ rm -f pdfsizeopt_libexec_extraimgopt_linux-v3.tar.gz Installation instructions and usage with Docker on Linux and macOS ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ There is no installer, you need to run some commands in the command line to download and install. pdfsizeopt is a command-line only application, there is no GUI. To optimize a PDF, install Docker, and then run this command: docker run -v "$PWD:/workdir" -u "$(id -u):$(id -g)" --rm -it ptspts/pdfsizeopt pdfsizeopt input.pdf output.pdf If the input PDF has many images or large images, pdfsizeopt can be very slow. You can speed it up by disabling pngout, the slowest image optimization method, like this: docker run -v "$PWD:/workdir" -u "$(id -u):$(id -g)" --rm -it ptspts/pdfsizeopt pdfsizeopt --use-pngout=no input.pdf output.pdf pdfsizeopt creates lots of temporary files (psotmp.*) in the output directory, but it also cleans up after itself. It's possible to optimize a PDF outside the current directory. To do that, specify the pathname (including the directory name) in the command-line. To avoid typing a long command, run (echo '#! /bin/sh'; echo 'exec docker run -v "$PWD:/workdir" -u "$(id -u):$(id -g)" --rm -it ptspts/pdfsizeopt pdfsizeopt "$@"') >pdfsizeopt && chmod 755 pdfsizeopt , and then copy the pdfsizeopt script to your PATH, then open a new terminal window, and now this command will also work to optimize a PDF: pdfsizeopt input.pdf output.pdf Please note that the ptspts/pdfsizeopt Docker image is updated very rarely. To use a more up-to-date version, run these commands to download (without the leading `$'): wget -O pdfsizeopt.single https://raw.githubusercontent.com/pts/pdfsizeopt/master/pdfsizeopt.single chmod +x pdfsizeopt.single Then run this command to optimize a PDF: docker run -v "$PWD:/workdir" -u "$(id -u):$(id -g)" --rm -it ptspts/pdfsizeopt ./pdfsizeopt.single --use-pngout=no input.pdf output.pdf If you want to have extra image optimizers included, use ptspts/pdfsizeopt-with-extraimgopt instead of ptspts/pdfsizeopt in the commands above. Example: docker run -v "$PWD:/workdir" -u "$(id -u):$(id -g)" --rm -it ptspts/pdfsizeopt-with-extraimgopt pdfsizeopt --use-image-optimizer=sam2p,jbig2,pngout,zopflipng,optipng,advpng,ECT input.pdf output.pdf Installation instructions and usage on Windows ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ There is no installer, you need to run some commands in the command line (black Command Prompt window) to download and install. pdfsizeopt is a command-line only application, there is no GUI. Create folder C:\pdfsizeopt, download https://github.com/pts/pdfsizeopt/releases/download/2017-09-02w/pdfsizeopt_win32exec-v6.zip , and extract its contents to the folder C:\pdfsizeopt, so that the file C:\pdfsizeopt\pdfsizeopt.exe exists. Download https://raw.githubusercontent.com/pts/pdfsizeopt/master/pdfsizeopt.single and save it to C:\pdfsizeopt, as C:\pdfsizeopt\pdfsizeopt.single . To optimize a PDF, run the following command: C:\pdfsizeopt\pdfsizeopt input.pdf output.pdf in the command line, which is a black Command Prompt window, you can start it by Start menu / Run / cmd.exe, or finding Command Prompt in the start menu. (Press Tab to get filename completion while typing.) Since you have to type the input filename as a full pathname, it's recommended to create a directory with a short name (e.g. C:\pdfs), and copy the input PDF there first. If the input PDF has many images or large images, pdfsizeopt can be very slow. You can speed it up by disabling pngout, the slowest image optimization method, like this: C:\pdfsizeopt\pdfsizeopt --use-pngout=no input.pdf output.pdf To avoid typing C:\pdfsizeopt\pdfsizeopt, add C:\pdfsizeopt to (the end of) the system PATH, open a new Command Prompt window, and the command `pdfsizeopt' will work from any directory. Depending on your environment, filenames with accented characters may not work in the Windows version of pdfsizeopt. To play it safe, make sure your input and output files have names with letters, numbers, underscore (_), dash (-), dot (.) and plus (+). The backslash (\) and the slash (/) are both OK as the directory separator. Spaces in filenames and pathnames should work, but you need to put double quotes (") around the name. Filenames with some punctuation characters (such as double quote ("), question mark (?) and asterisk (*)) and nonprintable characters (such as newline) will not work on Windows. This is because Windows doesn't support these characters ([\x00..\x1f\"*:<>?|\x7f] in filenames at all, and it uses / and \\ as directory separator. You can also put pdfsizeopt to a directory other than C:\pdfsizeopt , but it won't work if there is whitespace or there are accented characters in any of the folder names. Please note that pdfsizeopt works perfectly in Wine (tested with wine-1.2 on Ubuntu Lucid and wine-1.6.2 on Ubuntu Trusty), but it's a bit slower than running it natively (as a Linux or Unix program). Installation instructions and usage on macOS ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ There is no installer, you need to run some commands in the command line to download and install. pdfsizeopt is a command-line only application, there is no GUI. To install pdfsizeopt on a macOS system (with architecture i386 or amd64), open a terminal window and run these commands (without the leading `$'): $ mkdir ~/pdfsizeopt $ cd ~/pdfsizeopt $ curl -L -o pdfsizeopt_libexec_darwin.tar.gz https://github.com/pts/pdfsizeopt/releases/download/2017-09-03d/pdfsizeopt_libexec_darwin-v1.tar.gz $ tar xzvf pdfsizeopt_libexec_darwin.tar.gz $ rm -f pdfsizeopt_libexec_darwin.tar.gz $ curl -L -o pdfsizeopt.single https://raw.githubusercontent.com/pts/pdfsizeopt/master/pdfsizeopt.single $ chmod +x pdfsizeopt.single $ ln -s pdfsizeopt.single pdfsizeopt Do a test optimization run, which exercises all dependencies of pdfsizeopt: $ curl -L -o deptest.pdf https://github.com/pts/pdfsizeopt/raw/master/deptest/deptest.pdf $ ~/pdfsizeopt/pdfsizeopt deptest.pdf ... and open (view) deptest.pdf and the corresponding optimized deptest.pso.pdf . To optimize a PDF, run the following command: ~/pdfsizeopt/pdfsizeopt input.pdf output.pdf If the input PDF has many images or large images, pdfsizeopt can be very slow. You can speed it up by disabling pngout, the slowest image optimization method, like this: ~/pdfsizeopt/pdfsizeopt --use-pngout=no input.pdf output.pdf Also, if you have an 32-bit Mac, then the pngout bundled with pdfsizeopt won't work (because it needs a 64-bit Mac), so you have to force --use-pngout=no . See the section ``Image optimizers'' for alternatives of pngout. pdfsizeopt creates lots of temporary files (psotmp.*) in the output directory, but it also cleans up after itself. It's possible to optimize a PDF outside the current directory. To do that, specify the pathname (including the directory name) in the command-line. Please note that the commands above download most dependencies (including Ghostscript, but excluding Python) as well. Everything should work as instructed above, out of the box. If you are experiencing problems, please report an issue on https://github.com/pts/pdfsizeopt/issues . To avoid typing ~/pdfsizeopt/pdfsizeopt, add "$HOME/pdfsizeopt" to your PATH (probably in your ~/.bashrc), open a new terminal window, and the command pdfsizeopt will work from any directory. You can also put pdfsizeopt to a directory other than ~/pdfsizeopt , as you like. Installation instructions and usage on FreeBSD ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ There is no installer, you need to run some commands in the command line to download and install. pdfsizeopt is a command-line only application, there is no GUI. pdfsizeopt works perfectly on x86 FreeBSD systems with the Linux emulation layer enabled. So, enable the Linux emulation layer on your FreeBSD system, and then follow the ``Installation instructions and usage on Linux''. Alterantively, you can follow the ``Installation instructions and usage on generic Unix'', but that needs much more work on your part (and it's inconvenient and error-prone), because you need to install many dependencies separately, possibly compiling some of them from source. Installation instructions and usage on generic Unix ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ There is no installer, you need to run some commands in the command line (black Command Prompt window) to download and install. pdfsizeopt is a command-line only application, there is no GUI. pdfizeopt is a Python script. It works with Python 2.4, 2.5, 2.6 and 2.7 (but it doesn't work with Python 3.x). So please install Python first. Create a new directory named pdfsizeopt, and download this link there: https://raw.githubusercontent.com/pts/pdfsizeopt/master/pdfsizeopt.single Rename it to pdfsizeopt and make it executable by running the following commands (without the leading `$'): $ cd pdfsizeopt $ mv pdfsizeopt.single pdfsizeopt $ chmod +x pdfsizeopt If your Python executable is not /usr/bin/python, then edit the first line (starting with `#!') in the pdfsizeopt script accordingly. Try it with: $ ./pdfsizeopt --version info: This is pdfsizeopt ZIP rUNKNOWN size=105366. pdfsizeopt has many dependencies. For full functionality, you need all of them. Install all of them and put them to the PATH. Dependencies: * Python (command: python). Version 2.4, 2.5, 2.6 and 2.7 work (3.x doesn't work). * Ghostscript (command: gs): Version 9 or newer should work. * jbig2 (command: jbig2): Install from source: https://github.com/pts/pdfsizeopt-jbig2 If you are unable to install, use pdfsizeopt --use-jbig2=no . * pngout (command: pngout): Download binaries from here: http://www.jonof.id.au/kenutils Source code is not available. If you are unable to install, use pdfsizeopt --use-pngout=no . * png22pnm (command: png22pnm): Install from source: https://github.com/pts/tif22pnm This is required by sam2p to open PNG files. Please note that the bundled tif22pnm command is not needed. * sam2p (command: sam2p): Install from source: https://github.com/pts/sam2p If you are unable to install, use pdfsizeopt --do-optimize-images=no . Some Linux distributions have sam2p binaries, but they tend to be too old. Please use version >=0.49.3. After installation, use pdfsizeopt as: $ ./pdfsizeopt input.pdf output.pdf You can add the directory containing pdfsizeopt to the PATH, so the command `pdfsizeopt' will work from any directory. Image optimizers ~~~~~~~~~~~~~~~~ pdfsizeopt can use the following external tools to make images in embedded PDF files smaller: * sam2p (used by default, cannot be disabled) * jbig2 (used by default, disable with --use-jbgi2=no) * pngout (used by default, disable with --use-pngout=no) * zopflipng (not enabled by default) * optipng (not enabled by default) * advpng (not enabled by default) * ECT (not enabled by default) To enable or disable any image optimizer, specify all image optimizers you want to be enabled like this: --use-image-optimizer=optipng,jbig2 . This will also disable the default pngout. You can also specify custom image optimizer command patterns by specifying separate, additional --use-image-optimier= flags, like this: --use-image-optimizer="optipng %(sourcefnq)s -o6 -fix -force %(optipng_gray_flags)s-out %(targetfnq)s" You always have to specify %(targetfnq) in the command pattern. Specify --do-debug-image-optimizers=yes to see which image optimizers are enabled (and their full command-line) for the current run. At startup, pdfsizeopt checks that the requested image optimizers are available (as program files), and fails if some of them are missing. To ignore those which are missing, specify --do-require-image-optimizers=no . It's your (the user's) responsibility to install the image optimizers and add them to the PATH. If you follow the installation instructions for Windows and Linux above, the default image optimizers (sam2p, jbig2 and pngout) will be installed for you. For Linux, there are also installation instructions above for extra image optimizers (zopflipng, optipng, advpng and ECT). Troubleshooting ~~~~~~~~~~~~~~~ 1. pdfsizeopt fails for some fonts. """"""""""""""""""""""""""""""""""" Specify --do-unify-fonts=no and --do-regenerate-all-fonts=no . If it still fails, specify -do-optimize-fonts=no . In either case, please report it on https://github.com/pts/pdfsizeopt/issues 2. pdfsizeopt fails for some images. """""""""""""""""""""""""""""""""""" Specify --do-optimize-images=no . Please report it on https://github.com/pts/pdfsizeopt/issues 3. pdfsizeopt is too slow processing images. """""""""""""""""""""""""""""""""""""""""""" Specify --use-pngout=no . This disables pngout, which is the slowest optimization step for images. 4. pdfsizeopt fails without creating the output PDF. """""""""""""""""""""""""""""""""""""""""""""""""""" Please report it on https://github.com/pts/pdfsizeopt/issues , attaching the input PDF file and the console output of pdfsizeopt. Your report is very much appreciated. If pdfsizeopt exits with an uncaught exception, it may leave some temporary files (psotmp.*) behind in the current directory. You can remove these files. Please note that pdfsizeopt is not resilient in processing corrupt PDF files (i.e. those which are not compliant to the PDF standard). So if pdfsizeopt fails, then the reason may be a bug in pdfsizeopt or a corrupt PDF input file. Nevertheless, please report an issue (see above). 5. The output PDF of pdfsizeopt doesn't look like the same as the input PDF. """""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" Please report it on https://github.com/pts/pdfsizeopt/issues , attaching the input PDF file and the output PDF file (.pso.pdf) and the console output of pdfsizeopt. Your report is very much appreciated. 6. pdfsizeopt is unable to find some input files on Windows. """""""""""""""""""""""""""""""""""""""""""""""""""""""""""" This may happen if the filename or the full pathname contains any character other than the ASCII letters (a-z and A-Z), digits (0-9), underscore (_), ASCII dash (-), plus (+), dot (.), backslash (\) or slash (/). Typically these characters don't work: * spaces and tabs: This is easy to fix, just wrap the filename in double quotes ("), the usual way. * double quotes ("): This can't happen, filenames on Windows are not allowed to contain double quotes. If you need to pass a non-filename argument with a double quote in it to pdfsizeopt, do this. Wrap the argument in double quotes ("), replace all double quotes (") with \", and (in parallel to the previous replacement) replace a sequence backslashes (\) and an double quote (") immediately following them by duplicating the backslashes and replacing the double quote (") with \". This sounds complicated, but this is the usual way for other programs as well, see https://stackoverflow.com/a/4094897/97248 . * newlines and other non-space whitespace: This won't work, the Windows Command Prompt (cmd.exe) doesn't allow these characters in command-line arguments. Also Windows doesn't allow them in filenames. * accented characters (such as á and ő). These characters won't work (or it may work for only some characters, depending on the active code page) in the PDF filename specified in the commandline, or in the full pathname of pdfsizeopt (so don't install pdfsizeopt to C:\bőr, it won't work). Accented characters (outside the active code page) will not work in the full pathname of pdfsizeopt (such as C:\bőr\pdfsizeopt.exe). That's because Python is unable to call external programs (os.system, os.popen, os.spawnl and subprocess.call) with accented characters in their name, because it uses the single-byte API. * anything which is not ASCII printable (code between 33 and 126, inclusive): If not covered above, this may not work. See the description of accented characters. If some filenames still don't work, the workarounds are: * renaming or copying the file (and folders) in Windows Explorer, and passing the renamed file to pdfsizeopt * using pdfsizeopt on a Unix system (e.g. Linux, FreeBSD, macOS) instead Accented characters in PDF filename could be made work the following way (as a future improvement work to pdfsizeopt): * pdfsizeopt.exe should call the 16-bit API (GetCommandLineW) instead of the single-byte API (GetCommandLineA) to get the arguments * pdfsizeopt.exe should escape the non-ASCII characters in the arguments (e.g. as U+12AB) * pdfsizeopt.exe should run pdfsizeopt.single like this: .../pdfsizeopt_win32exec/pdfsizeopt_python.exe .../pdfsizeopt.single --args-u+ ... * pdfsizeopt Python code should recognize --args-u+, and when finding the filename, it should convert it to unicode (by keeping ASCII except for U+12AB), and it should pass tha unicode-typed value to open(...). Such an open(...) works in Python 2.6 on Windows. * When displaying filenames, pdfsizeopt Python code should still display the ASCII with the U+12AB escaping. Thus the win32console module is not needed. Thus filenames will be displayed leglibly but incorrectly (not copy-pasteably) in the Command Prompt window. * No escaping is needed in command lines of helper programs (e.g. gs, sam2p), because it's all ASCII, because filenames are autogenerated temporary fil names, which are all ASCII, and path to pdfsizeopt itself is required to the ASCII. Accented characters in the pathname of pdfsizeopt.single can be made work this way (as a future improvement work to pdfsizeopt): * Do the accented characters in the filename above first. * pdfsizeopt.exe should use wgetcwd to get the current directory. * pdfsizeopt.exe should use wchdir to change to the directory of pdfsizeopt.single . * pdfsizeopt.exe should prepend the directories pdfsizeopt_win32exec and pdfsizeopt_win32exec/pdfsizeopt_gswin to the PATH, using wputenv. * pdfsizeopt.exe should run pdfsizeopt.single like this: pdfsizeopt_python.exe pdfsizeopt.single --args-u+ --cwd=... ... , where the value of --cwd= is the escaped (U+12AB) version of the result of wgetcwd. * pdfsizeopt Python code should prepend the value of --cwd=... to the input filename if it's relative. * pdfsizeopt Python code shouldn't modify the PATH if --cwd=... is present. (Does this environment variable propagation work in Python 2.6.? Let's try!) * It's still true that no escaping is needed in command lines of external programs (e.g. gs, sam2p), because it's all ASCII, because temporary file names are all ASCII, and path to pdfsizeopt itself is required to the ASCII. Escaping is needed if the pathname of the temporary directory (TEMP variable) needs escaping. 7. Error on Windows: The application failed to initialize properly (0xc0000034). Click on OK to terminate the application. """""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" This error has happened on a Windows XP system. The solution: download msvcr90.dll (or find it somewhere already on your system), and copy it into pdfsizeopt_win32exec (next to python26.dll). Any version of msvcr90.dll will work: * msvcr90.dll 9.0.21022.8 (655872 bytes) * msvcr90.dll 9.0.30729.6161 (653136 bytes) * msvcr90.dll 9.0.30729.9247 (653968 bytes) 8. Error on Windows: The system cannot execute the specified command. """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" This error has happened on a Windows XP system when the file Microsoft.VC90.CRT.manifest was missing from the pdfsizeopt_win32exec directory. The solution: reinstall pdfsieopt, the directory pdfsizeopt_win32exec in the newest version has that file. More documentation ~~~~~~~~~~~~~~~~~~ * https://github.com/pts/pdfsizeopt/releases/download/docs-v1/pts_pdfsizeopt2009.psom.pdf White paper on EuroTex 2009. * https://github.com/pts/pdfsizeopt/releases/download/docs-v1/pts_pdfsizeopt2009_talk.psom.pdf Conference talk slides on EuroTex 2009. __END__