PST is a format for encoding structured text similar to Bourne shell formatting
and JSON. PST supports strings, numbers (integers and floating-point), bool,
missing values (none
), arrays, objects (key-value pairs),
single-character flags (-x
), and string flags (--abc
).
Relative to JSON, PST is simpler, while supporting much of its features.
PST aims to be human and machine readable, and suitable for command-line
argument formatting, standard input/output and configuration file
formatting. PST is similar to YAML, but supporting one-line expressions
(indentation does not matter).
Implementations of PST as a command-line program and a Python 3 function are available.
This example is adapted from Wikipedia and is licensed under the CC BY-SA 3.0 license.
PST:
firstName: John
lastName: Smith
isAlive: true
age: 27
address: {{
streetAddress: "21 2nd Street"
city: "New York"
state: NY
postalCode: 10021-3100
}}
phoneNumbers: {
{{ type: home number: "212 555-1234" }}
{{ type: office number: "646 555-4567" }}
{{ type: mobile number: "123 456-7890" }}
}
children: { }
spouse: none
JSON:
{
"firstName": "John",
"lastName": "Smith",
"isAlive": true,
"age": 27,
"address": {
"streetAddress": "21 2nd Street",
"city": "New York",
"state": "NY",
"postalCode": "10021-3100"
},
"phoneNumbers": [
{
"type": "home",
"number": "212 555-1234"
},
{
"type": "office",
"number": "646 555-4567"
},
{
"type": "mobile",
"number": "123 456-7890"
}
],
"children": [],
"spouse": null
}
The same PST could be supplied as command-line arguments (albeit very long):
pst firstName: John lastName: Smith isAlive: true age: 27 address: {{ \
streetAddress: "21 2nd Street" city: "New York" state: NY postalCode: \
10021-3100 }} phoneNumbers: { {{ type: home number: "212 555-1234" }} \
{{ type: office number: "646 555-4567" }} {{ type: mobile number: \
"123 456-7890" }} } children: { } spouse: none
would output the following JSON:
{"children": [], "phoneNumbers": [{"number": "212 555-1234", "type": "home"}, {"number": "646 555-4567", "type": "office"}, {"number": "123 456-7890", "type": "mobile"}], "firstName": "John", "isAlive": true, "spouse": null, "age": 27, "lastName": "Smith", "address": {"state": "NY", "streetAddress": "21 2nd Street", "city": "New York", "postalCode": "10021-3100"}}
PST is composed of a sequence of words, which encode elementary types such as strings, integers, floating-point numbers or arbitrarily nested complex types such as arrays (list) and objects (dict).
Strings do not need to be quoted unless they contain white space, special characters which could be interpreted as a number or bracket. Words composed of digits are implicitly converted to numbers unless quoted.
Curly brackets enclose arrays. Double curly brackets enclose explicit objects. Objects are composed of key-value pairs, which can be located inline (implicit objects) or inside double curly brackets (explicit objects). Unlike implicit objects, it is possible to use explicit objects as the value of a key-value pair. Flags beginning with a dash and double dash are converted to key-value pairs.
Any amount of white space or indentation is equivalent to a single space.
Separation between words, brackets, special characters such as :
in the key of
a key-value pair matters.
8-bit ASCII-compatible character encoding is assumed. Strings can contain any binary data by using escape characters. Conversion from UTF-8 character encoding to Unicode is supported by the Python PST API.
PST is designed to be compatible with JSON, while also being suitable for
command-line argument passing. For example, special characters which would
clash with other uses are not used: (
, )
have special interpretation
in Bash, [
, ]
are commonly used in documentation of command-line programs
to denote optional arguments. Implicit objects make it easy to denote named
command-line arguments. Flags ensure established syntax can be used to express
command-line arguments. Arrays and objects enable complex
command-line arguments. No need for quoting common strings and no commas make
it easier to write PST than JSON.
# Empty
PST:
JSON: null
# Single string
PST: a
JSON: ["a"]
# Quoted string
PST: "a b"
JSON: "a b"
# Partially-quoted string
PST: a"b c"
JSON: "ab c"
# Two strings
PST: a b
JSON: ["a", "b"]
# Two strings separated by a newline
PST:
a
b
JSON: ["a", "b"]
# Key-value pair
PST: a: 1
JSON: {"a": 1}
# Sequence of key-value pairs
PST: a: 1 b: 2
JSON: {"a": 1, "b": 2}
# Sequence of key-value pairs and a string
PST: a: 1 b: 2 c
JSON: [{"a": 1, "b": 2}, "c"]
# Empty array
PST: { }
JSON: []
# String and an empty array
PST: a { }
JSON: ["a", []]
# Empty object
PST: {{ }}
JSON: {}
# String and an empty object
PST: a {{ }}
JSON: ["a", {}]
# String and an array
PST: a { b c }
JSON: ["a", ["b", "c"]]
# An array as value followed by a string
PST: a: { b c } d
JSON: [{"a": ["b", "c"]}, "d"]
# Literals
PST: true false none
JSON: [true, false, null]
# Single-character flags
PST: -ab
JSON {"a": true, "b": true}
# String flag
PST: --ab
JSON: {"ab": true}
pst <pst>...
Convert PST-formatted arguments to JSON. Prints JSON to the standard output.
pstf < input.pst
Convert PST-formatted standard input to JSON. Prints JSON to the standard output.
import pst
pst.decode(s, as_unicode=False)
Decode PST. s
is PST (binary string) or a list of PST. If as_unicode
(bool)
is True
, convert strings to Unicode on output by assuming the UTF-8 encoding.
Invalid UTF-8 bytes are encoded using the "surrogateescape" encoding in the
U+DCxx Unicode range.
pst.decode_argv(argv, delim=True, **kwargs)
Decode PST and split the resulting list into positional and named arguments.
argv
is a list such as sys.argv
and kwargs
are keyword arguments passed
to pst.decode
. Returns a tuple (args
, opts
), where args
are positional
arguments and opts
are named arguments. If delim
is True, interpret a
standalone double-dash argument (--
) in argv
as an end of options delimiter,
after which all arguments are treated as literal string arguments.
pst.encode(x, encoder=None, indent=False, indent_len='tab', flags=False, short_flags=False, long_flags=False, escape=False)
Encode Python structure x
consisting of list, tuple, dict, byte, str, int and
float as PST (either as scalars or nested). Returns bytes. encoder
is a
user-defined function to transform individual elements of the structure to one
of the above types before they are read by the encoder. If indent
is true,
output indentation is applied. indent_len
is the number of space characters
used for indentation or tab
for indentation with the tab character. If
flags
is true, key-value pairs with a value of true are encoded as flags. If
short_flags
is true, key-value pairs with a value of true and
single-character key are encoded as single-character flags. If long_flags
is
true, key-value pairs with a value of true and multiple-character key are
encoded as string flags. If escape
is true, non-printable ASCII characters in
strings are encoded as escape sequences.
-
Install the required system packages. On Debian-derived distributions (Ubuntu, Devuan, ...):
apt install python3-full python3-pip pipx
On Fedora:
sudo yum install python3 python3-pip pipx
-
Install PST. If you indend to only use the command-line interface, you can install PST with pipx:
pipx install pst-format
You might have to add
$HOME/.local/bin
to the PATH environment variable if not present already in order to access the pst and pstf commands. This can be done withpipx ensurepath
.If you indend to use the Python interface, you can install in the home directory with pip3:
pip3 install pst-format
Replace pip3 with pip if pip3 is not available. Add
--break-system-packages
if your distribution does not allow installing into the home directory but you want to anyway.Alternatively, install into a Python virtual environment with:
python3 -m venv venv . venv/bin/activate pip3 install pst-format
You can then use the PST Python interface from within the virtual environment. Deactivate the environment with
deactivate
.
You should now be able to run the commands pst
and pstf
.
-
Install Python. In the installer, tick
Add python.exe to PATH
. -
Open the Command Prompt from the Start menu. Install PST with:
pip3 install pst-format
You should now be able to run the commands pst
and pstf
.
Important: On macOS the pst command should be used with the command line shell bash, not the default zsh, which is not compatible with the argument syntax.
Open the Terminal. Install PST with:
python3 -m pip install pst-format
Make sure that /Users/<user>/Library/Python/<version>/bin
is included in the
PATH
environment variable if not already, where <user>
is your system
user name and <version>
is the Python version. This path should be printed
by the above command. This can be done by adding this line to the file
.zprofile
in your home directory and restart the Terminal:
PATH="$PATH:/Users/<user>/Library/Python/<version>/bin"
You should now be able to run the commands pst
and pstf
.
To uninstall if installed with pipx:
pipx uninstall pst-format
To uninstall if installed with pip3 or pip:
pip3 uninstall pst-format
Replace pip3 with pip if pip3 is not available.
mkdir example
cd example
mkdir a b
pst *
["a", "b"]
touch a/1 a/2 b/3 b/4
pst a: { a/* } b: { b/* }
[{"a": ["a/1", "a/2"], "b": ["b/3", "b/4"]}]
# Better
pst a: { $(ls a/* --quoting-style c) } b: { $(ls b/* --quoting-style c) }
[{"a": ["1", "2"], "b": ["3", "4"]}]
PST is a sequence of words separated by white space, encoded in 8-bit ASCII.
White space characters are space (
), form-feed (\f
), newline (\n
),
carriage return (\r
), horizontal tab (\t
), and vertical tab (\v
).
White space is a sequence of white space characters.
A word is a sequence of non-white space characters, and white space
characters if they are inside a quoted part. A quoted part of a word is a part
of a word enclosed in double quotes ("
). A character inside a word preceded by
backslash (\
) is escaped, and is treated literally (loses its special meaning),
unless it is one of the ANSI C quotes, in which case it is translated to the
corresponding 8-bit ASCII character:
\a
: alert/bell (7)\b
: backspace (8)\e
: escape (27)\f
: form feed (12)\n
: newline (10)\r
: carriage return (13)\t
: horizontal tab (9)\v
: vertical tab (11)\nnn
: octal value nnn, one to three digits
Non-quoted words true
, false
, none
are literals, and are interpreted as
true, false, null (respectively).
An integer is a word composed of non-quoted digits.
A floating-point number is a words composed of non-quoted digits and a
non-quoted dot (.
), beginning with a digit.
A number is an integer or a floating-point number.
A bracket is a word which is a non-quoted opening or closing curly
bracket ({
, }
).
A double bracket is a word which is a non-quoted opening or closing double
curly bracket ({{
, }}
).
A word ending with a non-quoted colon (:
) is a key.
A string is a word which is not a key, literal, number, bracket, double bracket, single-character flag or a string flag.
An array is a PST enclosed in square brackets.
A value is a string, literal, number, or array following a key.
A key followed by a value is a key-value pair.
An implicit object is a sequence of one or more key-value pairs not enclosed in brackets. An implicit object cannot be the value in a key-value pair.
An explicit object is a sequence of zero or more key-value pairs enclosed in double brackets. An explicit object can be the value in a key-value pair. Words inside the brackets which are not key-value pairs are ignored.
Single-character flags are characters in a word beginning with an non-quoted
dash (-
). Single-character flag is interpreted as an implicit object
{c: True}
, where c
is the character.
A string flag is a string in a word beginning with an non-quoted double-dash
(--
). String flag is interpreted as an implicit object {s: True}
, where s
is the string.
- Improved installation.
- Added a double-dash (
--
) delimiter option to decode_argv and this is now the default (potentially breaks compatibility). - Removed obsolete Python 2.7 code.
- Fixed Unicode encoding.
- Fixed indentation of empty objects.
- Fixed application of encoder.
- Dropped support for Python 2.7.
- Added encode function.
- Fixed parsing of empty strings.
- Fixed closing of implicit object inside list.
- Improved documentation.
- Dropped support for Python 2.
- Added pstf.
- Support for explicit objects.
Initial release.
Public domain. See LICENSE.md.