matplotlib/mplcairo

The sizes of saved vector outputs (PDF, EPS) are too large.

WinDerek opened this issue · 3 comments

Version Information

>>> import mplcairo
>>> mplcairo.get_versions()
{'python': '3.9.5 (default, Jun  4 2021, 12:28:51) \n[GCC 7.5.0]', 'mplcairo': '0.4', 'matplotlib': '3.5.1', 'cairo': '1.16.0', 'freetype': '2.10.1', 'pybind11': '2.6.2', 'raqm': None, 'hb': None}

Problem Specification

mplcairo Backend

Code:

from pathlib import Path

import numpy as np
import matplotlib
print("matplotlib.__version__:", matplotlib.__version__)
print('Default backend:', matplotlib.get_backend())
matplotlib.use("module://mplcairo.base")
# matplotlib.use("cairo")
print('Backend is now:', matplotlib.get_backend())
import matplotlib.pyplot as plt
matplotlib.rcParams['pdf.fonttype'] = 42
matplotlib.rcParams['ps.fonttype'] = 42


def format_size(num, suffix="B"):
    """Reference: https://stackoverflow.com/a/1094933
    """
    for unit in ["", "K", "M", "G", "T", "P", "E", "Z"]:
        if abs(num) < 1024.0:
            return f"{num:3.1f}{unit}{suffix}"
        num /= 1024.0
    return f"{num:.1f}Y{suffix}"


# Plot and save figures
fig, ax = plt.subplots(figsize=(8,6), dpi=300)
for i in range(5):
    ax.plot(range(100000), np.random.rand(100000), linewidth=2.0)
fig.savefig('./mplcairo_file_size_test.pdf', format='pdf', bbox_inches='tight')
fig.savefig('./mplcairo_file_size_test.eps', format='eps', bbox_inches='tight')
fig.savefig('./mplcairo_file_size_test.png', format='png', bbox_inches='tight')
print("Figures saved!")


# Display the sizes
pathlist = [ Path("./mplcairo_file_size_test.pdf"), Path("./mplcairo_file_size_test.eps"), Path("./mplcairo_file_size_test.png") ]
for path in sorted(pathlist):
    print("{:s}: {:s}".format(path.name, format_size(path.stat().st_size)))

Output:

matplotlib.__version__: 3.5.1
Default backend: module://matplotlib_inline.backend_inline
Backend is now: module://mplcairo.base
Figures saved!
mplcairo_file_size_test.eps: 8.4MB
mplcairo_file_size_test.pdf: 8.4MB
mplcairo_file_size_test.png: 63.2KB

Default Backend

Code is the same as the mplcairo code except that the matplotlib.use line is commented out so that the default backend is used.

Output:

matplotlib.__version__: 3.5.1
Default backend: module://matplotlib_inline.backend_inline
Backend is now: module://matplotlib_inline.backend_inline
Figures saved!
mplcairo_file_size_test.eps: 1.2MB
mplcairo_file_size_test.pdf: 535.4KB
mplcairo_file_size_test.png: 76.8KB

Observation

The vector files (PDF, EPS) produced by mplcairo are much larger than those produced by the default backend (8.4MB v.s. 1.2MB). This issue is worse when there are more lines (Artists) in the figure. The difference in the file size is too large.

I can repro the problem. I suspect the issue is simply with calling path.cleaned(...) at the right places and tweaking the downstream code (e.g. because cleaned will already apply the transform, in order to perform simplifications), but currently don't have the bandwidth to look at that more in depth.

This should now be fixed as of master, please give it a try.

This should now be fixed as of master, please give it a try.

I just tried the latest master code, and the sizes are much smaller. Thanks for your fix!