MattDMo/PythonImproved

Add support for new-style string formatting

sprt opened this issue · 4 comments

sprt commented
Add support for new-style string formatting

Can you please elaborate? What sort of support would you like to see?

sprt commented

See https://docs.python.org/2/library/string.html#formatstrings

For example, {} in '{}'.format(foo) isn't properly highlighted but '%s' % foo is.

OK, I'll see what I can do. Development is kind of at a low ebb at the moment, as I'm pretty busy with my real job, but I'll definitely put this on the list. If you have any contributions to make, feel free to submit a PR.

sprt commented

Okay I came up with this regex:

(?<![^\{]\{)\{(?:(?:(?:[A-Za-z_]\w*|(?:[1-9]\d*|0|0[Oo]?[0-7]+|0[Xx][0-7]+|0[Bb][01]+)))?(?:\.[A-Za-z_]\w*|\[(?:(?:[1-9]\d*|0|0[Oo]?[0-7]+|0[Xx][0-7]+|0[Bb][01]+)|[^\]\}}\{{]+)\])*)?(?:\![rsa])?(?:\:(?:(?:.?[<>=\^])?[\x20+-]?\#?0?(?:[1-9]\d*|0|0[Oo]?[0-7]+|0[Xx][0-7]+|0[Bb][01]+)?,?(?:\.(?:[1-9]\d*|0|0[Oo]?[0-7]+|0[Xx][0-7]+|0[Bb][01]+))?[bcdEeFfGgnosXx%]?))?\}(?!\}[^\}])

This is the script I used to build it:

import re

def merge(*args):
    return r'(?:{})'.format(r'|'.join(args))

# integer
bininteger = r'0[Bb][01]+'
hexinteger = r'0[Xx][0-7]+'
octinteger = r'0[Oo]?[0-7]+'
decimalinteger = r'[1-9]\d*|0'
integer = merge(decimalinteger, octinteger, hexinteger, bininteger)

# format_spec
type_ = r'[bcdEeFfGgnosXx%]'
precision = integer
width = integer
sign = r'[\x20+-]'
align = r'[<>=\^]'
fill = r'.'
format_spec = r'(?:(?:{fill}?{align})?{sign}?\#?0?{width}?,?(?:\.{precision})?{type_}?)'.format(**locals())

conversion = r'[rsa]'
index_string = r'[^\]\}}\{{]+'
identifier = r'[A-Za-z_]\w*'
element_index = merge(integer, index_string)
attribute_name = identifier
arg_name = r'(?:{})?'.format(merge(identifier, integer))
field_name = r'(?:{arg_name}(?:\.{attribute_name}|\[{element_index}\])*)'.format(**locals())
replacement_field = r'(?<![^\{{]\{{)\{{{field_name}?(?:\!{conversion})?(?:\:{format_spec})?\}}(?!\}}[^\}}])'.format(**locals())

print(replacement_field)

You can see here that it works pretty well. I just copy-pasted the examples from the Python docs.

These are the only strings it can't parse:

'{:%Y-%m-%d %H:%M:%S}'
'{0:{fill}{align}16}'
'{0:{width}{base}}'

but to be honest they don't look valid according to the grammar. There seem to be some discrepancies between the docs and the implementation as I had to edit the index_string regex according to this.