Kozea/tinycss2

How to minify css?

brupelo opened this issue · 4 comments

@liZe Hello, nice to meet you! First of all, let me tell you I think your project is pretty amazing... today I've been researching about css parsers written in python and after testing very fast both tinycss and cssutils I'm not sure i've got sold by neither of them... but this one? This one just worked out of the box and I like how the code is really written, so... :)

Anyway, I'd like to ask you how could I write a simple minifier that strips both comments/whitespaces out... so far I've come up with this:

from io import StringIO

import tinycss2
from tinycss2 import ast


def minify1(rules):
    blacklist = (ast.WhitespaceToken, ast.Comment)
    f = StringIO()
    for v in rules:
        if isinstance(v, blacklist):
            continue
        f.write(v.serialize())
    return f.getvalue()


def minify2(rules):
    blacklist = (ast.WhitespaceToken, ast.Comment)
    f = StringIO()
    for v in rules:
        if isinstance(v, blacklist):
            continue
        if v.content:
            for vv in v.content:
                if isinstance(vv, blacklist):
                    continue
                else:
                    f.write(vv.serialize())
        else:
            f.write(v.serialize())

    return f.getvalue()


if __name__ == "__main__":
    rules = tinycss2.parse_stylesheet(
        """
        @charset "UTF-8";
        /* Body */
        body {
                    font-family: Cambria, "Hoefler Text", "Liberation Serif", Times, "Times New Roman", serif;
            background-color: #FFFFFF;
            margin: 0;
        }
        /* Container */
        .container {
            width: 90%;
            margin-left: auto;
            margin-right: auto;
            background-color: #FFFFFF;
        }
        /* Header */
        header {
            width: 100%;
            height: 8%;
            background-color: #5D5E5D;
            border-bottom: 1px solid #353635;
        }
        .logo {
            color: #fff;
            font-weight: bold;
            margin-left: auto;
            letter-spacing: 4px;
            margin-right: auto;
            text-align: center;
            padding-top: 15px;
            line-height: 2em;
            font-size: 22px;
        }
        .hero_header {
            color: #FFFFFF;
            text-align: center;
            margin: 0;
            letter-spacing: 4px;
        }
    """
    )

    print(minify1(rules))
    print("-")
    print(minify2(rules))

Unfortunately , the minify2 function is broken code and it'll spit out broken css grammar, ie:

@charset "UTF-8";font-family:Cambria,"Hoefler Text","Liberation Serif",Times,"Times New Roman",serif;background-color:#FFFFFF;margin:0;width:90%;margin-left:auto;margin-right:auto;background-color:#FFFFFF;width:100%;height:8%;background-color:#5D5E5D;border-bottom:1pxsolid#353635;color:#fff;font-weight:bold;margin-left:auto;letter-spacing:4px;margin-right:auto;text-align:center;padding-top:15px;line-height:2em;font-size:22px;color:#FFFFFF;text-align:center;margin:0;letter-spacing:4px;

So, could you please advice so I can fix it? Also, I guess I'll need somehow to convert it to recursive form (right now the idea was just dealing with 2 levels of depth) but I'm still getting familiar myself with both css grammar and tinycss2 code....

Thanks in advance

liZe commented

Hello!

It’s not easy to write a good minifier. Skipping ast classes is hard, because you can’t just strip spaces and comments: some of them can be removed, but some of them have to be replaced by at least a space. It depends on the context.

Here’s some code that works with your example, but won’t always work:

from io import StringIO

from tinycss2 import ast, parse_stylesheet


def write_item(item, f):
    if isinstance(item, (ast.WhitespaceToken, ast.Comment)):
        f.write(' ')
    else:
        f.write(item.serialize())


def minify(rules):
    f = StringIO()
    for v in rules:
        if v.content:
            for vv in v.prelude:
                write_item(vv, f)
            f.write('{')
            for vv in v.content:
                write_item(vv, f)
            f.write('}')
        else:
            f.write(v.serialize())
    return f.getvalue()


if __name__ == "__main__":
    rules = parse_stylesheet(
        """
        @charset "UTF-8";
        /* Body */
        body {
                    font-family: Cambria, "Hoefler Text", "Liberation Serif", Times, "Times New Roman", serif;
            background-color: #FFFFFF;
            margin: 0;
        }
        /* Container */
        .container {
            width: 90%;
            margin-left: auto;
            margin-right: auto;
            background-color: #FFFFFF;
        }
        /* Header */
        header {
            width: 100%;
            height: 8%;
            background-color: #5D5E5D;
            border-bottom: 1px solid #353635;
        }
        .logo {
            color: #fff;
            font-weight: bold;
            margin-left: auto;
            letter-spacing: 4px;
            margin-right: auto;
            text-align: center;
            padding-top: 15px;
            line-height: 2em;
            font-size: 22px;
        }
        .hero_header {
            color: #FFFFFF;
            text-align: center;
            margin: 0;
            letter-spacing: 4px;
        }
    """, skip_comments=True, skip_whitespace=True)
    print(minify(rules))

You can also:

  • Handle rules with more nested levers (@page for example, that’s what you say in your comment).
  • Skip more useless spaces (before/after {/}/: for example).
  • Skip other useless tokens (; at the end of the latest rule for example).
  • Replace too long values (#fff instead of #ffffff for example).

It’s not easy, but it’s definitely a cool project 😄 and it’s a good way to learn more about the CSS syntax.

Have fun!

Awesome, that's definitely some serious improvements you've made there :) .

Few hours ago I was reading the tinycss2 code (specially parser.py) and I've noticed you were already dealing somehow with redundant node (ie: parse_rule_list). I guess a possible solution for this particular problem could be somehow add multiple optimization passes so the ast will become more and more optimal... cos the current parser is producing a tree, right?

Anyway, before going any further... I'd like to stick to your snippet, you said but won’t always work and that bothers me, on which cases would fail? I mean, I don't care if the optimization is not the most optimal but the main requirement would be spitting out valid code on each pass.

Thanks!

Ps. I guess this particular use-case is out of the scope of tinycss2, right?

liZe commented

Few hours ago I was reading the tinycss2 code (specially parser.py) and I've noticed you were already dealing somehow with redundant node (ie: parse_rule_list). I guess a possible solution for this particular problem could be somehow add multiple optimization passes so the ast will become more and more optimal... cos the current parser is producing a tree, right?

It’s true, you can find many functions that can help you to finely optimize different parts of the CSS tree. The current parser somehow returns a tree, but it’s actually more nested lists of objects, with some of these objects having "children" (sometimes more than one list of children, like rules having prelude and content).

Anyway, before going any further... I'd like to stick to your snippet, you said but won’t always work and that bothers me, on which cases would fail? I mean, I don't care if the optimization is not the most optimal but the main requirement would be spitting out valid code on each pass.

What I mean is: I’m not sure that it can’t crash, I’m not sure that the output is always valid, and I’m sure that some parts are far from optimal. But it’s definitely a good start :).

Ps. I guess this particular use-case is out of the scope of tinycss2, right?

The scope of tinycss2 is what you’ll be able to with it, right? 😉

liZe commented

@brupelo Is there anything more we can do for you?