modernish is an ambitious, as-yet experimental, cross-platform POSIX shell feature detection and language extension library. It aims to extend the shell language with extensive feature testing and language enhancements, using the power of aliases and functions to extend the shell language using the shell language itself.
The name is a pun on Modernizr, the JavaScript feature testing library, -sh, the common suffix for UNIX shell names, and -ish, still not quite a modern programming language but perhaps a little closer. jQuery is another source of general inspiration; like it, modernish adds a considerable feature set by using the power of the language it's implemented in to extend/transcend that same language.
That said, the aim of modernish is to build a better shell language, and not to make the shell language into something it's not. Its feature set is aimed at solving specific and commonly experienced deficits and annoyances of the shell language, and not at adding/faking things that are foreign to it, such as object orientation or functional programming. (However, since modernish is modular, nothing stops anyone from adding a module attempting to implement these things.)
The library builds on pure POSIX 2013 Edition (including full C-style shell arithmetics with assignment, comparison and conditional expressions), so it should run on any POSIX-compliant shell and operating system. But it does not shy away from using non-standard extensions where available to enhance performance or robustness.
Some example programs are in share/doc/modernish/examples
.
Modernish also comes with a suite of regression tests to detect bugs in modernish itself. See Appendix B.
- Getting started
- Two basic forms of a modernish program
- Interactive use
- Non-interactive command line use
- Internal namespace
- Shell feature testing
- Modernish system constants
- Legibility aliases
- Enhanced exit
- Reliable emergency halt
- Low-level shell utilities
- Feature testing
- Working with variables
- Quoting strings for subsequent parsing by the shell
- The stack
- Hardening: emergency halt on error
- Simple tracing of commands
- External commands without full path
- Outputting strings
- Enhanced dot scripts
- Testing numbers, strings and files
- Basic string operations
- Basic system utilities
- Modules
- Appendix A
- Appendix B
Run install.sh
and follow instructions, choosing your preferred shell
and install location. After successful installation you can run modernish
shell scripts and write your own. Run uninstall.sh
to remove modernish.
Both the install and uninstall scripts are interactive by default, but support fully automated (non-interactive) operation as well. Command line options are as follows:
install.sh
[ -n
] [ -s
shell ] [ -f
] [ -d
installroot ] [ -D
prefix ]
-n
: non-interactive operation-s
: specify default shell to execute modernish-f
: force unconditional installation on specified shell-d
: specify root directory for installation-D
: extra destination directory prefix (for packagers)
uninstall.sh
[ -n
] [ -f
] [ -d
installroot ]
-n
: non-interactive operation-f
: delete */modernish directories even if files left-d
: specify root directory of modernish installation to uninstall
The simplest way to write a modernish program is to source modernish as a dot script. For example, if you write for bash:
#! /bin/bash
. modernish
use safe
use sys/base
...your program starts here...
The modernish 'use' command load modules with optional functionality. safe
is
a special module that introduces a new and safer way of shell programming, with
field splitting (word splitting) and pathname expansion (globbing) disabled by
default. The sys/base
module contains modernish versions of certain basic but
non-standardised utilities (e.g. readlink
, mktemp
, which
), guaranteeing
that modernish programs all have a known version at their disposal. There are
many other modules as well. See below for more information.
The above method makes the program dependent on one particular shell (in this case, bash). So it is okay to mix and match functionality specific to that particular shell with modernish functionality.
The most portable way to write a modernish program is to use the special generic hashbang path for modernish programs. For example:
#! /usr/bin/env modernish
#! use safe
#! use sys/base
...your program begins here...
A program in this form is executed by whatever shell the user who installed modernish on the local system chose as the default shell. Since you as the programmer can't know what shell this is (other than the fact that it passed some rigorous POSIX compliance testing executed by modernish), a program in this form must be strictly POSIX compliant -- except, of course, that it should also make full use of the rich functionality offered by modernish.
Note that modules are loaded in a different way: the use
commands are part of
hashbang comment (starting with #!
like the initial hashbang path). Only such
lines that immediately follow the initial hashbang path are evaluated; even
an empty line in between causes the rest to be ignored.
- modernish, like most shells, fully supports two locales: POSIX (a.k.a.
C, a.k.a. ASCII) and Unicode's UTF-8. It will work in others, but things
like converting to upper/lower case, and matching single characters in
patterns, are not guaranteed.
Caveat: some shells or operating systems have bugs that prevent (or lack features required for) full locale support. If portability is a concern, check forthisshellhas BUG_MULTIBYTE
orthisshellhas BUG_NOCHCLASS
where needed. See Appendix A under Bugs. - Scripts/programs should not change the locale (
LC_*
orLANG
) after initialising modernish. Doing this might break various functions, as modernish sets specific versions depending on your OS, shell and locale. (Temporarily changing the locale is fine as long as you don't use modernish features that depend on it -- for example, setting a specific locale just for an external command. However, if you useharden()
, see the important note in its documentation below!)
Modernish is primarily designed to enhance shell programs/scripts, but also
offers features for use in interactive shells. For instance, the new with
loop construct from the loop/with
module can be quite practical to repeat
an action x times, and the safe
module on interactive shells provides
convenience functions for manipulating, saving and restoring the state of
field splitting and globbing.
To use modernish on your favourite interactive shell, you have to add it to
your .profile
, .bashrc
or similar init file.
Important: Upon initialising, modernish adapts itself to
other settings, such as the locale. So you have to organise your
.profile
or similar file in the following order:
- first, define general system settings (
PATH
, locale, etc.); - then,
. modernish
anduse
any modules you want; - then define anything that may depend on modernish.
After installation, the modernish
command can be invoked as if it were a
shell, with the standard command line options from other shells (such as
-c
to specify a command or script directly on the command line), plus some
enhancements. The effect is that the shell chosen at installation time will
be run enhanced with modernish functionality. It is not possible to use
modernish as an interactive shell in this way.
Usage:
modernish
[--use=
module | option ... ] [ scriptfile ] [ arguments ]modernish
[--use=
module | option ... ]-c
[ script [ me-name [ arguments ] ] ]modernish --test
modernish --version
In the first form, the --use
long-form option preloads any given modernish
modules, any given short or long-form shell options
are set or unset (the syntax is identical to that of POSIX shells and the
shell options
supported depend on the shell executing modernish), and then scriptfile is
loaded and executed with any arguments assigned to the positional parameters.
The module argument to each specified --use
option is split using
standard shell field splitting. The first field is the module name and any
further fields become arguments to that module's initialisation routine.
Using the shell option -e
or -o errexit
is an error, because modernish
does not support it and
would break. If the shell option -x
or -o xtrace
is given, modernish sets
the PS4
prompt to a useful value that traces the line number and exit status,
as well as the current file and function names if the shell is capable of this.
In the second form, after pre-loading any modules and setting any shell
options as in the first form, -c
executes the specified modernish
script, optionally with the me-name assigned to $ME
and the
arguments assigned to the positional parameters. This is identical to the
-c
option on POSIX shells, except that the me-name is assigned to $ME
and not $0
(because POSIX shells do not allow changing $0
).
The --test
option runs the regression test suite and exits. This verifies
that the modernish installation is functioning correctly.
See Appendix B for more information.
The --version
option outputs the version of modernish and exits.
- Count to 10 using loop/with:
modernish --use=loop/with -c 'with i=1 to 10; do putln "$i"; done'
- Run a portable-form
modernish program using zsh and enhanced-prompt xtrace:
zsh /usr/local/bin/modernish -o xtrace /path/to/program.sh
Function-local variables are not supported by the standard POSIX shell; only
global variables are provided for. Modernish needs a way to store its
internal state without interfering with the program using it. So most of the
modernish functionality uses an internal namespace _Msh_*
for variables,
functions and aliases. All these names may change at any time without
notice. Any names starting with _Msh_
should be considered sacrosanct and
untouchable; modernish programs should never directly use them in any way.
Of course this is not enforceable, but names starting with _Msh_
should be
uncommon enough that no unintentional conflict is likely to occur.
Modernish includes a battery of shell bug, quirk and feature tests, each of
which is given a special ID. These are easy to query using the thisshellhas
function, e.g. if thisshellhas LOCAL, then
... That same function also tests
if 'thisshellhas' a particular reserved word, builtin command or shell option.
To reduce start up time, the main bin/modernish script only includes the bug/quirk/feature tests that are essential to the functioning of it; these are considered built-in tests. The rest, considered external tests, are included as small test scripts in libexec/modernish/cap/*.t which are sourced on demand.
Feature testing is used by library functions to conveniently work around bugs or
take advantage of special features not all shells have. For instance,
ematch
will use [[
var =~
regex ]]
if available and fall back to
invoking awk
to use its builtin match()
function otherwise.
But the use of feature testing is not restricted to
modernish itself; any script using the library can do this in the same way.
The thisshellhas
function is an essential component of feature testing in
modernish. There is no standard way of testing for the presence of a shell
built-in or reserved word, so different shells need different methods; the
library tests for this and loads the correct version of this function.
See Appendix A below for a list of capabilities and bugs currently tested for.
Modernish provides certain constants (read-only variables) to make life easier. These include:
$MSH_VERSION
: The version of modernish.$MSH_PREFIX
: Installation prefix for this modernish installation (e.g. /usr/local).$ME
: Path to the current program. Replacement for$0
. This is necessary if the hashbang path#!/usr/bin/env modernish
is used, or if the program is launched likesh /path/to/bin/modernish /path/to/script.sh
, as these set$0
to the path to bin/modernish and not your program's path.$MSH_SHELL
: Path to the default shell for this modernish installation, chosen at install time (e.g. /bin/sh). This is a shell that is known to have passed all the modernish tests for fatal bugs. Cross-platform scripts should use it instead of hard-coding /bin/sh, because on some operating systems (NetBSD, OpenBSD, Solaris) /bin/sh is not POSIX compliant.$SIGPIPESTATUS
: The exit status of a command killed bySIGPIPE
(a broken pipe). For instance, if you usegrep something somefile.txt | more
and you quitmore
beforegrep
is finished,grep
is killed by SIGPIPE and exits with that particular status. Some modernish functions, such asharden
andtraverse
, need to handle such a SIGPIPE exit specially to avoid unduly killing the program. The exact value of this exit status is shell-specific, so modernish runs a quick test to determine it at initialisation time.
IfSIGPIPE
was set to ignore by the process that invoked the current shell,SIGPIPESTATUS
can't be detected and is set to the special value 99999. See also the description of theWRN_NOSIGPIPE
ID forthisshellhas
.$DEFPATH
: The default system path guaranteed to find compliant POSIX utilities, as given bygetconf PATH
.
POSIX does not provide for the quoted C-style escape codes commonly used in
bash, ksh and zsh (such as $'\n'
to represent a newline character),
leaving the standard shell without a convenient way to refer to control
characters. Modernish provides control character constants (read-only
variables) with hexadecimal suffixes $CC01
.. $CC1F
and $CC7F
, as well as $CCe
,
$CCa
, $CCb
, $CCf
, $CCn
, $CCr
, $CCt
, $CCv
(corresponding with
printf
backslash escape codes). This makes it easy to insert control
characters in double-quoted strings.
More convenience constants, handy for use in bracket glob patterns for use
with case
or modernish match
:
$CONTROLCHARS
: All the control characters.$WHITESPACE
: All whitespace characters.$ASCIIUPPER
: The ASCII uppercase letters A to Z.$ASCIILOWER
: The ASCII lowercase letters a to z.$ASCIIALNUM
: The ASCII alphanumeric characters 0-9, A-Z and a-z.$SHELLSAFECHARS
: Safelist for shell-quoting.$ASCIICHARS
: The complete set of ASCII characters (minus NUL).
A few aliases that seem to make the shell language look slightly friendlier:
alias not='! ' # more legible synonym for '!'
alias so='[ "$?" -eq 0 ]' # test preceding command's success with
# 'if so;' or 'if not so;'
alias forever='while :;' # indefinite loops: forever do <stuff>; done
exit
: extended usage: exit
[ -u
] [ status [ message ] ]
If the -u option is given, the function showusage() is called, which has
a simple default but can be redefined by the script.
die
: reliably halt program execution, even from within subshells, optionally
printing an error message. Note that die
is meant for an emergency program
halt only, i.e. in situations were continuing would mean the program is in an
inconsistent or undefined state. Shell scripts running in an inconsistent or
undefined state may wreak all sorts of havoc. They are also notoriously
difficult to terminate correctly, especially if the fatal error occurs within
a subshell: exit
won't work then. That's why die
is optimised for
killing all the program's processes (including subshells and external
commands launched by it) as quickly as possible. It should never be used for
exiting the program normally.
On interactive shells, die
behaves differently. It does not kill or exit your
shell; instead, it issues SIGINT
to the shell to abort the execution of your
running command(s), which is equivalent to pressing Ctrl+C.
Usage: die
[ message ]
A special DIE
pseudosignal can be trapped (using plain old trap
or
pushtrap
)
to perform emergency cleanup commands upon
invoking die
. On interactive shells, DIE
traps are never executed (though
they can be set and printed). On non-interactive shells, in order to kill the
malfunctioning program as quickly as possible (hopefully before it has a chance
to delete all your data), die
doesn't wait for those traps to complete before
killing the program. Instead, it executes each DIE
trap simultaneously as a
background job, then gathers the process IDs of the main shell and all its
subprocesses, sending SIGKILL
to all of them except any DIE
trap processes.
(One case where die
is limited is when the main shell program has exited,
but several runaway background processes that it forked are still going. If
die
is called by one of those background processes, then it will kill that
background process and its subshells, but not the others. This is due to an
inherent limitation in the design of POSIX operating systems. When the main
shell exits, its surviving background processes are detached from the
process hierarchy and become independent from one another, with no way to
determine that they once belonged to the same program.)
insubshell
: easily check if you're currently running in a subshell. This
function takes no arguments. It returns success (0) if it was called from
within a subshell and non-success (1) if not. In either case, the process ID
(PID) of the current subshell or main shell is stored in REPLY
. (Note that
on AT&T ksh93, which does not fork a new process for non-background
subshells, that PID is same as the main shell's except for background jobs.)
setstatus
: manually set the exit status $?
to the desired value. The
function exits with the status indicated. This is useful in conditional
constructs if you want to prepare a particular exit status for a subsequent
'exit' or 'return' command to inherit under certain circumstances.
thisshellhas
is the central function of the modernish feature testing
framework. It tests if one or more shell built-in commands, shell reserved
words (a.k.a. keywords), shell options, or shell capabilities/quirks/bugs are
present on the current shell.
This function is designed to minimise the need to avoid calling it to optimise
performance. Where appropriate, test results are cached in an internal variable
after the first test, so repeated checks using thisshellhas
are efficient.
Usage:
thisshellhas
[ --cache
| --show
] item [ item ... ]
- If item contains only ASCII capital letters A-Z, digits 0-9 or
_
, return the result status of the associated modernish feature, quirk or bug test. - If item is an ASCII all-lowercase word, check if it's a shell reserved word or built-in command on the current shell.
- If item starts with
--rw=
or--kw=
, check if the identifier immediately following these characters is a shell reserved word (a.k.a. shell keyword). - If item starts with
--bi=
, similarly check for a shell built-in command. - If item starts with
--sig=
, check if the shell knows about a signal (usable bykill
,trap
, etc.) by the name or number following the=
. If a number > 128 is given, the remainder of its division by 128 is checked. If the signal is found, its canonicalised signal name is left in theREPLY
variable, otherwiseREPLY
is unset. (If multiple--sig=
items are given and all are found,REPLY
contains only the last one.) - If item is
-o
followed by a separate word, check if this shell has a long-form shell option by that name. - If item is any other letter or digit preceded by a single
-
, check if this shell has a short-form shell option by that character. - The
--cache
option runs all external modernish bug/quirk/feature tests that have not yet been run, causing the cache to be complete. - The
--show
option performs a--cache
and then outputs all the IDs of positive results, one per line.
thisshellhas
continues to process items until one of them produces a
negative result or is found invalid, at which point any further items are
ignored. So the function only returns successfully if all the items
specified were found on the current shell. (To check if either one item or
another is present, use separate thisshellhas
invocations separated by the
||
shell operator.)
Note that the tests for the presence of reserved words, built-in commands, shell options, and signals only check if an item by that name exists on this shell. No attempt is made to verify that it does the same thing as on another shell.
Exit status: 0 if this shell has all the items in question; 1 if not; 2 if an item was encountered that is not recognised as a valid identifier.
isvarname
: Check if argument is valid portable identifier in the shell,
that is, a portable variable name, shell function name or long-form shell
option name. (Modernish requires portable names everywhere; for example,
accented or non-Latin characters in variable names are not supported.)
isset
: check if a variable, shell function or option is set. Usage:
isset
varname: Check if a variable is set.isset -v
varname: Id.isset -x
varname: Check if variable is exported.isset -r
varname: Check if variable is read-only.isset -f
funcname: Check if a shell function is set.isset -
optionletter (e.g.isset -C
): Check if shell option is set.isset -o
optionname: Check if shell option is set by long name.
Exit status: 0 if the item is set; 1 if not; 2 if the argument is not recognised as a syntactically valid identifier.
When checking a shell option, a nonexistent shell option is not an error,
but returns the same result as an unset shell option. (To check if a shell
option exists, use thisshellhas
.
Note: just isset -f
checks if shell option -f
(a.k.a. -o noglob
) is
set, but with an extra argument, it checks if a shell function is set.
Similarly, isset -x
checks if shell option -x
(a.k.a -o xtrace
)
is set, but isset -x
varname checks if a variable is exported. If you
use unquoted variable expansions here, make sure they're not empty, or
the shell's empty removal mechanism will cause the wrong thing to be checked
(even in use safe
mode).
unexport
: the opposite of export
. Unexport a variable while preserving
its value, or (while working under set -a
) don't export it at all.
Usage is like export
, with the caveat that variable assignment arguments
containing non-shellsafe characters or expansions must be quoted as
appropriate, unlike in some specific shell implementations of export
.
(To get rid of that headache, use safe
.)
shellquote
: Quote the values of specified variables in such a way that the
values are suitable for parsing by the shell as string literals. This is
essential for the safe use of eval
or any other context where the shell
must parse untrusted input. shellquote
only uses quoting mechanisms
specified by POSIX, so the quoted values it produces are safe to parse
in any POSIX shell. They are also safe to parse using
xargs
(1).
Usage: shellquote
[ -f
|+f
] varname [ [ -f
|+f
] varname ... ]
The values of the variables specified by name are shell-quoted and stored
back into those variables. By default, a value is only quoted if it contains
characters not present in $SHELLSAFECHARS
. An -f
argument forces
unconditional quoting for subsequent variables; an +f
argument restores
default behaviour. shellquote
returns success (0) if all variables were
processed successfully, and non-success (1) if any undefined (unset)
variables were encountered. In the latter case, any set variables still get
their values quoted.
shellquoteparams
: shell-quote the current shell's positional parameters
in-place.
storeparams
: store the positional parameters, or a sub-range of them,
in a variable, in a shellquoted form suitable for restoration using
eval "set -- $varname"
. For instance: storeparams -f2 -t6 VAR
quotes and stores $2
to $6
in VAR
.
push
& pop
: every variable and shell option gets its own stack. For
variables, both the value and the set/unset state is (re)stored. Usage:
push
[--key=
value ] item [ item ... ]pop
[--keepstatus
] [--key=
value ] item [ item ... ]
where item is a valid portable variable name, a short-form shell option
(dash plus letter), or a long-form shell option (-o
followed by an option
name, as two arguments). The precise shell options supported (other than the
ones guaranteed by POSIX) depend on the shell modernish is running on. For
cross-shell compatibility, nonexistent shell options are treated as unset.
Before pushing or popping anything, both functions check if all the given
arguments are valid and pop
checks all items have a non-empty stack. This
allows pushing and popping groups of items with a check for the integrity of
the entire group. pop
exits with status 0 if all items were popped
successfully, and with status 1 if one or more of the given items could not
be popped (and no action was taken at all).
The --key=
option is an advanced feature that can help different modules
or funtions to use the same variable stack safely. If a key is given to
push
, then for each item, the given key value is stored along with the
variable's value for that position in the stack. Subsequently, restoring
that value with pop
will only succeed if the key option with the same key
value is given to the pop
invocation. Similarly, popping a keyless value
only succeeds if no key is given to pop
. If there is any key mismatch, no
changes are made and pop returns status 2. For instance, if a function
pushes all its values with something like --key=myfunction
, it can do a
loop like while pop --key=myfunction var; do ...
even if var
already has
other items on its stack that shouldn't be tampered with. Note that this is
a robustness/convenience feature, not a security feature; the keys are not
hidden in any way. (The var/setlocal
module, which provides stack-based local variables, internally makes use of
this feature.)
If the --keepstatus
option is given, pop
will exit with the
exit status of the command executed immediately prior to calling pop
. This
can avoid the need for awkward workarounds when restoring variables or shell
options at the end of a function. However, note that this makes failure to pop
(stack empty or key mismatch) a fatal error that kills the program, as pop
no longer has a way to communicate this through its exit status.
The shell options stack allows saving and restoring the state of any shell
option available to the set
builtin using push
and pop
commands with
a syntax similar to that of set
.
Long-form shell options are matched to their equivalent short-form shell
options, if they exist. For instance, on all POSIX shells, -f
is
equivalent to -o noglob
, and push -o noglob
followed by pop -f
works
correctly. (This works even for shell-specific short & long option
equivalents; modernish internally does a check to find any equivalent.)
On shells with a dynamic no
option name prefix, that is on ksh, zsh and
yash (where, for example, noglob
is the opposite of glob
), the no
prefix is ignored, so something like push -o glob
followed by pop -o noglob
does the right thing. But this depends on the shell and should never
be used in cross-shell scripts.
pushtrap
and poptrap
: traps are now also stack-based, so that each
program component or library module can set its own trap commands
without interfering with others.
Note an important difference between the trap stack and stacks for variables
and shell options: pushing traps does not save them for restoring later, but
adds them alongside other traps on the same signal. All pushed traps are
active at the same time and are executed from last-pushed to first-pushed
when the respective signal is triggered. Traps cannot be pushed and popped
using push
and pop
but use dedicated commands as follows.
Usage:
pushtrap
[--key=
value ] [--
] command sigspec [ sigspec ... ]poptrap
[--key=
value ] [--
] sigspec [ sigspec ... ]
pushtrap
works like regular trap
, with the following exceptions:
- Adds traps for a signal without overwriting previous ones.
(However, any traps set prior to initialising modernish, or by bypassing
the modernish 'trap' alias to access the system command directly, will
be overwritten by a
pushtrap
for the same signal. To remedy this, you can issue a simpletrap
command; as modernish prints the traps, it will quietly detect ones it doesn't yet know about and make them work nicely with the trap stack.) - Unlike regular traps, a stack-based trap does not cause a signal to be ignored. Setting one will cause it to be executed upon the shell receiving that signal, but after the stack traps complete execution, modernish re-sends the signal to the main shell, causing it to behave as if no trap were set (unless a regular POSIX trap is also active).
- Stack-based traps are only executed if pushed in the main shell. Using
pushtrap
within a subshell has no effect (except adding dummy traps for printing with atrap
command without arguments). - Each stack trap is executed in a new subshell to keep it from interfering with others. This means a stack trap cannot change variables except within its own environment, and 'exit' will only exit the trap and not the program.
pushtrap
stores current$IFS
(field splitting) and$-
(shell options) along with the pushed trap. Within the subshell executing each stack trap, modernish restoresIFS
and the shell optionsf
(noglob
),u
(nounset
) andC
(noclobber
) to the values in effect during the correspondingpushtrap
. This is to avoid unexpected effects in case a trap is triggered while temporary settings are in effect.- The
--key
option applies the keying functionality inherited from plainpush
to the trap stack. It works the same way, so the description is not repeated here.
poptrap
takes just signal names or numbers as arguments. It takes the
last-pushed trap for each signal off the stack, storing the commands that
was set for those signals into the REPLY variable, in a format suitable for
re-entry into the shell. Again, the --key
option works as in
plain pop
.
Modernish tries hard to avoid incompatibilities with existing trap practice. To that end, it intercepts the regular POSIX 'trap' command using an alias, reimplementing and interfacing it with the shell's builtin trap facility so that plain old regular traps play nicely with the trap stack. You should not notice any changes in the POSIX 'trap' command's behaviour, except for the following:
- The regular 'trap' command does not overwrite stack traps (but does overwrite previous regular traps).
- The 'trap' command with no arguments, which prints the traps that are set
in a format suitable for re-entry into the shell, now also prints the
stack traps as 'pushtrap' commands. (
bash
users might notice theSIG
prefix is not included in the signal names written.) - When setting traps, signal name arguments may now have the
SIG
prefix on all shells; that prefix is quietly accepted and discarded. - Saving the traps to a variable using command substitution (as in:
var=$(trap)
) now works on every shell supported by modernish, including (d)ash, mksh and zsh. - To reset (unset) a trap, the modernish 'trap' command accepts both
valid POSIX syntax
and legacy bash/(d)ash/zsh syntax, like
trap INT
to unset a SIGINT trap (which only works if the 'trap' command is given exactly one argument). Note that this is for compatibility with existing scripts only.
POSIX traps for each signal are always executed after that signal's stack-based traps; this means they should not rely on modernish modules that use the trap stack to clean up after themselves on exit, as those cleanups would already have been done.
Modernish introduces a new DIE
(-1) pseudosignal whose traps are
executed upon invoking die
in scripts. This is analogous to the
EXIT
(0) pseudosignal that is built in to all POSIX shells. All
trap-related commands in modernish support this new pseudosignal. Note
that DIE
traps are never executed on interactive shells.
See the die
description for
more information.
On interactive shells, INT
traps (both POSIX and stack) are cleared out
after executing them once. This is because die
uses SIGINT for cleanup and command interruption on interactive shells.
pushparams
and popparams
: push and pop the complete set of positional
parameters. No arguments are supported.
For the four functions below, item can be:
- a valid portable variable name
- a short-form shell option: dash plus letter
- a long-form shell option:
-o
followed by an option name (two arguments) @
to refer to the positional parameters stack--trap=
SIGNAME to refer to the trap stack for the indicated signal
stackempty
[ --key=
value ] [ --force
] item: Tests if the stack
for an item is empty. Returns status 0 if it is, 1 if it is not. The key
feature works as in pop
: by default, a key
mismatch is considered equivalent to an empty stack. If --force
is given,
this function ignores keys altogether.
stacksize
[ --silent
| --quiet
] item: Leaves the size of a stack in
the REPLY
variable and, if option --silent
or --quiet
is not given,
writes it to standard output.
The size of the complete stack is returned, even if some values are keyed.
printstack
[ --quote
] item: Outputs a stack's content.
Option --quote
shell-quotes each stack value before printing it, allowing
for parsing multi-line or otherwise complicated values.
Column 1 to 7 of the output contain the number of the item (down to 0).
If the item is set, column 8 and 9 contain a colon and a space, and
if the value is non-empty or quoted, column 10 and up contain the value.
Sets of values that were pushed with a key are started with a special
line containing --- key:
value. A subsequent set pushed with no key is
started with a line containing --- (key off)
.
Returns status 0 on success, 1 if that stack is empty.
clearstack
[ --key=
value ] [ --force
] item [ item ... ]:
Clears one or more stacks, discarding all items on it.
If (part of) the stack is keyed or a --key
is given, only clears until a
key mismatch is encountered. The --force
option overrides this and always
clears the entire stack (be careful, e.g. don't use within
setlocal
... endlocal
).
Returns status 0 on success, 1 if that stack was already empty, 2 if
there was nothing to clear due to a key mismatch.
harden
: modernish's replacement for set -e
a.k.a. set -o errexit
(which is
fundamentally
flawed,
not supported and will break the library).
harden
installs a shell function that hardens a particular command by
checking its exit status against values indicating error or system failure.
Exactly what exit statuses signify an error or failure depends on the
command in question; this should be looked up in the
POSIX specification
(under "Utilities") or in the command's man
page or other documentation.
If the command fails, the function installed by harden
calls die
, so it
will reliably halt program execution, even if the failure occurred within a
subshell (for instance, in a pipe construct or command substitution).
harden
(along with use safe
) is an essential feature for robust shell
programming that current shells lack. In shell programs without modernish,
proper error checking is too inconvenient and therefore rarely done. It's often
recommended to use set -e
a.k.a set -o errexit
, but that is broken in
various strange ways (see links above) and the idea is often abandoned. So,
all too often, shell programs simply continue in an inconsistent state after a
critical error occurs, occasionally wreaking serious havoc on the system.
Modernish harden
was designed to help solve that problem properly.
Usage:
harden
[ -f
funcname ] [ -[cpXtPE]
] [ -e
testexpr ]
[ var=
value ... ] [ -u
var ... ] command_name_or_path
[ command_argument ... ]
The -f
option hardens the command as the shell function funcname instead
of defaulting to command_name_or_path as the function name. (If the latter
is a path, that's always an invalid function name, so the use of -f
is
mandatory.)
The -c
option causes command_name_or_path to be hardened and run
immediately instead of setting a shell function for later use. This option
is meant for commands that run once; it is not efficient for repeated use.
It cannot be used together with the -f
option.
The -e
option, which defaults to >0
, indicates the exit statuses
corresponding to a fatal error. It depends on the command what these are;
consult the POSIX spec and the manual pages.
The status test expression testexpr, argument
to the -e
option, is like a shell arithmetic
expression, with the binary operators ==
!=
<=
>=
<
>
turned
into unary operators referring to the exit status of the command in
question. Assignment operators are disallowed. Everything else is the same,
including &&
(logical and) and ||
(logical or) and parentheses.
Note that the expression needs to be quoted as the characters used in it
clash with shell grammar tokens.
The -X
option causes harden
to always search for and harden an external
command, even if a built-in command by that name exists.
The -E
option causes the hardening function to consider it a fatal error
if the hardened command writes anything to the standard error stream. This
option allows hardening commands (such as
bc
)
where you can't rely on the exit status to detect an error. The text written
to standard error is passed on as part of the error message printed by
die
. Note that:
- Intercepting standard error necessitates that the command be executed from a
subshell. This means any builtins or shell functions hardened with
-E
cannot influence the calling shell (e.g.harden -E cd
renderscd
ineffective). -E
does not disable exit status checks; by default, any exit status greater than zero is still considered a fatal error as well. If your command does not even reliably return a 0 status upon success, then you may want to add-e '>125'
, limiting the exit status check to reserved values indicating errors launching the command and signals caught.
The -p
option causes harden
to search for commands using the
system default path (as obtained with getconf PATH
) as opposed to the
current $PATH
. This ensures that you're using a known-good external
command that came with your operating system. By default, the system-default
PATH search only applies to the command itself, and not to any commands that
the command may search for in turn. But if the -p
option is specified at
least twice, or if the command is a shell function (hardened under another name
using -f
), the command is run in a subshell with PATH
exported as the
default path, which is equivalent to adding a PATH=$DEFPATH
assignment
argument (see below).
Examples:
harden make # simple check for status > 0
harden -f tar '/usr/local/bin/gnutar' # id.; be sure to use this 'tar' version
harden -e '> 1' grep # for grep, status > 1 means error
harden -e '==1 || >2' gzip # 1 and >2 are errors, but 2 isn't (see manual)
As far as the shell is concerned, hardened commands are shell functions and not external or builtin commands. This essentially changes one behaviour of the shell: variable assignments preceding the command will not be local to the command as usual, but will persist after the command completes. (POSIX technically makes that behaviour optional but all current shells behave the same in POSIX mode.)
For example, this means that something like
harden -e '>1' grep
# [...]
LC_ALL=C grep regex some_ascii_file.txt
should never be done, because the meant-to-be-temporary LC_ALL
locale
assignment will persist and is likely to cause problems further on.
To solve this problem, harden
supports adding these assignments as
part of the hardening command, so instead of the above you do:
harden -e '>1' LC_ALL=C grep
# [...]
grep regex some_ascii_file.txt
With the -u
option, harden
also supports unsetting variables for the
duration of a command, e.g.:
harden -e '>1' -u LC_ALL grep
Pitfall alert: if the -u
option is used, this causes the hardened command to
run in a subshell with those variables unset, because using a subshell is the
only way to avoid altering those variables' state in the main shell. This is
usually fine, but note that a builtin command hardened with use of -u
cannot
influence the calling shell. For instance, something like harden -u LC_ALL cd
renders cd
ineffective: the working directory is only changed within the
subshell which is then immediately left.
The same happens if you harden a shell function under another name using
-f
while adding environment variable assignments (or using the -p
option, which effectively adds PATH=$DEFPATH
). The hardened function
will not be able to influence the main shell. Also note that the hardening
function will export the assigned environment variables for the duration of
that subshell, so those variables will be inherited by any external command
run from the hardened function. (While hardening shell functions using
harden
is possible, it's not really recommended and it's better to call
die
directly in your shell function upon detecting an error.)
If you're piping a command's output into another command that may close
the pipe before the first command is finished, you can use the -P
option
to allow for this:
harden -e '==1 || >2' -P gzip # also tolerate gzip being killed by SIGPIPE
gzip -dc file.txt.gz | head -n 10 # show first 10 lines of decompressed file
head
will close the pipe of gzip
input after ten lines; the operating
system kernel then kills gzip
with the PIPE signal before it's finished,
causing a particular exit status that is greater than 128. This exit status
would normally make harden
kill your entire program, which in the example
above is clearly not the desired behaviour. If the exit status caused by a
broken pipe were known, you could specifically allow for that exit status in
the status expression. The trouble is that this exit status varies depending
on the shell and the operating system. The -p
option was made to solve
this problem: it automatically detects and whitelists the correct exit
status corresponding to SIGPIPE termination on the current system.
Tolerating SIGPIPE is an option and not the default, because in many contexts it may be entirely unexpected and a symptom of a severe error if a command is killed by a broken pipe. It is up to the programmer to decide which commands should expect SIGPIPE and which shouldn't.
Tip: It could happen that the same command should expect SIGPIPE in one context but not another. You can create two hardened versions of the same command, one that tolerates SIGPIPE and one that doesn't. For example:
harden -f hardGrep -e '>1' grep # hardGrep does not tolerate being aborted
harden -f pipeGrep -e '>1' -P grep # pipeGrep for use in pipes that may break
Note: If SIGPIPE
was set to ignore by the process invoking the current
shell, the -p
option has no effect, because no process or subprocess of
the current shell can ever be killed by SIGPIPE
. However, this may cause
various other problems and you may want to refuse to let your program run
under that condition.
thisshellhas WRN_NOSIGPIPE
can help
you easily detect that condition so your program can make a decision. See
the WRN_NOSIGPIPE description for more information.
The -t
option will trace command output. Each execution of a command
hardened with -t
causes the full command line to be output to standard
error, in the following format:
[functionname]> commandline
where functionname
is the name of the shell function used to harden the
command and commandline
is the complete and actual command executed. The
commandline
is properly shell-quoted in a format suitable for re-entry
into the shell (which is an enhancement over the builtin tracing facility on
most shells). If standard error is on a terminal that supports ANSI colours,
the tracing output will be colourised.
The -t
option was added to harden
because the commands that you harden
are often the same ones you would be particularly interested in tracing. The
advantage of using harden -t
over the shell's builtin tracing facility
(set -x
or set -o xtrace
) is that the output is a lot less noisy,
especially when using a shell library such as modernish.
Note: Internally, -t
uses the shell file descriptor 9, redirecting it to
standard error (using exec 9>&2
). This allows tracing to continue to work
normally even for commands that redirect standard error to a file (which is
another enhancement over set -x
on most shells). However, this does mean
harden -t
conflicts with any other use of the file descriptor 9 in your
shell program.
If file descriptor 9 is already open before harden
is called, harden
does not attempt to override this. This means tracing may be redirected
elsewhere by doing something like exec 9>trace.out
before calling
harden
. (Note that redirecting FD 9 on the harden
command itself will
not work as it won't survive the run of the command.)
Sometimes you just want to trace the execution of some specific commands as
in harden -t
(see above) without actually hardening them against command
errors; you might prefer to do your own error handling. trace
makes this
easy. It is modernish's replacement or complement for set -x
a.k.a. set -o xtrace
.
trace
is actually a shortcut for harden -tPe'>125'
commandname. The
result is that the indicated command is automatically traced upon execution.
Other options, including -f
, -c
and environment variable assignments, are
as in harden
.
A bonus is that you still get minimal hardening against fatal system errors. Errors in the traced command itself are ignored, but your program is immediately halted with an informative error message if the traced command:
- cannot be found (exit status 127);
- was found but cannot be executed (exit status 126);
- was killed by a signal other than
SIGPIPE
(exit status > 128, except the shell-specific exit status forSIGPIPE
).
Note: The caveat for command-local variable assignments for harden
also
applies to trace
. See
Important note on variable assignments
above.
extern
is like command
but always runs an external command, without
having to know or determine its location. This provides an easy way to
bypass a builtin, alias or function. It does the same $PATH
search
the shell normally does when running an external command. For instance, to
guarantee running external printf
just do: extern printf ...
Usage: extern
[ -p
] [ -v
] command [ argument ... ]
-p
: use the operating system's defaultPATH
(as determined bygetconf PATH
) instead of your current$PATH
for the command search. This guarantees a path that finds all the standard utilities defined by POSIX, akin tocommand -p
but still guaranteeing an external command. (Note thatextern -p
is more reliable thancommand -p
because many shell binaries don't ask the OS for the default path and have a wrong default path hard-coded in.)-v
: don't execute command but show the full path name of the command that would have been executed. Any extra arguments are taken as more command paths to show, one per line.extern
exits with status 0 if all the commands were found, 1 otherwise. This option can be combined with-p
.
putln
: prints each argument on a separate line. There is no processing of
options or escape codes. (Modernish constants $CCn
, etc. can be used to insert
control characters in double-quoted strings. To process escape codes, use
printf
instead.)
put
: prints each argument separated by a space, without a trailing separator
or newline. Again, there is no processing of options or escape codes.
echo
: This command is notoriously unportable and kind of broken, so is
deprecated in favour of put
and putln
. Modernish does provide its own
version of echo
, but it is only activated if modernish is in the hashbang
path or otherwise is itself used as the shell (the "most portable" way of
running programs
explained above).
If your script runs on a specific shell and sources modernish as a dot script
(. modernish
), or if you use modernish interactively in your shell profile,
the shell-specific version of echo
is left intact. This is to make it
possible to add modernish to existing shell-specific scripts without breaking
anything, while still providing one consistent echo
for cross-shell scripts.
The modernish version of echo
, if active, does not interpret any escape codes
and supports only one option, -n
, which, like BSD echo
, suppresses the
final newline. However, unlike BSD echo
, if -n
is the only argument, it is
not interpreted as an option and the string -n
is printed instead. This makes
it safe to output arbitrary data using this version of echo
as long as it is
given as a single argument (using quoting if needed).
source
: bash/zsh-style source
command now available to all POSIX
shells, complete with optional positional parameters given as extra
arguments (which is not supported by POSIX .
).
Complete replacement for test
/[
in the form of speed-optimised shell
functions, so modernish scripts never need to use that [
botch again.
Instead of inherently ambiguous [
syntax (or the nearly-as-confusing
[[
one), these familiar shell syntax to get more functionality, including:
let
: implementation of let
as in ksh, bash and zsh, now available to all
POSIX shells. This makes C-based signed integer arithmetic evaluation
available to every supported shell, with the exception of the unary "++" and
"--" operators (which have been given the capability designation ARITHPP).
This means let
should be used for operations and tests, e.g. both
let "x=5"
and if let "x==5"; then
... are supported (note single = for
assignment, double == for comparison).
isint
: test if a given argument is a decimal, octal or hexadecimal integer
number in valid POSIX shell syntax, ignoring leading (but not trailing) spaces
and tabs.
empty: test if string is empty
identic: test if 2 strings are identical
sortsbefore: test if string 1 sorts before string 2
sortsafter: test if string 1 sorts after string 2
contains: test if string 1 contains string 2
startswith: test if string 1 starts with string 2
endswith: test if string 1 ends with string 2
match: test if string matches a glob pattern
ematch: test if string matches an extended regex
These avoid the snags with symlinks you get with [
and [[
.
By default, symlinks are not followed. Add -L
to operate on files
pointed to by symlinks instead of symlinks themselves (the -L
makes
no difference if the operands are not symlinks).
is present: test if file exists (yields true even if invalid symlink)
is -L present: test if file exists and is not an invalid symlink
is sym: test if file is symlink
is -L sym: test if file is a valid symlink
is reg: test if file is a regular file
is -L reg: test if file is regular or a symlink pointing to a regular
is dir: test if file is a directory
is -L dir: test if file is dir or symlink pointing to dir
is fifo, is -L fifo, is socket, is -L socket, is blockspecial,
is -L blockspecial, is charspecial, is -L charspecial:
same pattern, you figure it out :)
By default, symlinks are not followed. Again, add -L
to follow them.
is newer: test if file 1 is newer than file 2 (or if file 1 exists,
but file 2 doesn't)
is older: test if file 1 is older than file 2 (or if file 1 doesn't
exist, but file 2 does)
is samefile: test if file 1 and file 2 are the same file (hardlinks)
is onsamefs: test if file 1 and file 2 are on the same file system (for
non-regular, non-directory files, test the parent directory)
is -L newer, is -L older, is -L samefile, is -L onsamefs:
same as above, but after resolving symlinks
Symlinks are followed.
is nonempty: test is file exists, is not an invalid symlink, and is
not empty (also works for dirs with read permission)
is setuid: test if file has user ID bit set
is setgid: test if file has group ID bit set
is onterminal: test if file descriptor is associated with a terminal
These use a more straightforward logic than [
and [[
.
Any symlinks given are resolved, as these tests would be meaningless
for a symlink itself.
can read: test if we have read permission for a file
can write: test if we have write permission for a file or directory
(for directories, only true if traverse permission as well)
can exec: test if we have execute permission for a regular file
can traverse: test if we can enter (traverse through) a directory
The main modernish library contains functions for a few basic string manipulation operations (because they are needed by other functions in the main library). Currently these are:
toupper: convert all letters to upper case
tolower: convert all letters to lower case
If no arguments are given, toupper
and tolower
copy standard input to
standard output, converting case.
If one or more arguments are given, they are taken as variable names (note:
they should be given without the $
) and case is converted in the contents
of the specified variables, without reading input or writing output.
toupper
and tolower
try hard to use the fastest available method on the
particular shell your program is running on. They use built-in shell
functionality where available and working correctly, otherwise they fall back
on running an external utility.
Which external utility is chosen depends on whether the current locale uses
the Unicode UTF-8 character set or not. For non-UTF-8 locales, modernish
assumes the POSIX/C locale and tr
is always used. For UTF-8 locales,
modernish tries hard to find a way to correctly convert case even for
non-Latin alphabets. A few shells have this functionality built in with
typeset
. The rest need an external utility. Even in 2017, it is a real
challenge to find an external utility on an arbitrary POSIX-compliant system
that will correctly convert case for all applicable UTF-8 characters.
Modernish initialisation tries tr
, awk
, GNU awk
and GNU sed
before
giving up and declaring BUG_CNONASCII. If thisshellhas BUG_CNONASCII
, it
means modernish is in a UTF-8 locale but has not found a way to convert
Case for NON ASCII characters, so toupper
and tolower
will convert
only ASCII characters and leave any other characters in the string alone.
Small utilities that should have been part of the standard shell, but aren't. Since their implementation is inexpensive, they are part of the main library instead of a module.
mkcd
: make one or more directories, then, upon success, change into the
last-mentioned one. mkcd
inherits mkdir
's usage, so options depend on
your system's mkdir
; only the
POSIX options
are guaranteed.
use
: use a modernish module. It implements a simple Perl-like module
system with names such as 'safe', 'var/setlocal' and 'loop/select'.
These correspond to files 'safe.mm', 'var/setlocal.mm', etc. which are
dot scripts defining functionality. Any extra arguments to the use
command are passed on to the dot script unmodified, so modules can
implement option parsing to influence their initialisation.
Does IFS=''; set -f -u -C
, that is: field splitting and globbing are
disabled, variables must be defined before use, and
Essentially, this is a whole new way of shell programming, eliminating most variable quoting headaches, protects against typos in variable names wreaking havoc, and protects files from being accidentally overwritten by output redirection.
Of course, you don't get field splitting and globbing. But modernish
provides various ways of enabling one or both only for the commands
that need them, setlocal
...endlocal
blocks chief among them
(see use var/setlocal
below).
On interactive shells (or if use safe -i
is given), also loads
convenience functions fsplit
and glob
to control and inspect the
state of field splitting and globbing in a more user friendly way.
It is highly recommended that new modernish scripts start out with use safe
.
But this mode is not enabled by default because it will totally break
compatibility with shell code written for default shell settings.
These shortcut functions are alternatives for using 'let'.
inc
, dec
, mult
, div
, mod
: simple integer arithmetic shortcuts. The first
argument is a variable name. The optional second argument is an
arithmetic expression, but a sane default value is assumed (1 for inc
and dec, 2 for mult and div, 256 for mod). For instance, inc X
is
equivalent to X=$((X+1))
and mult X Y-2
is equivalent to X=$((X*(Y-2)))
.
ndiv
is like div
but with correct rounding down for negative numbers.
Standard shell integer division simply chops off any digits after the
decimal point, which has the effect of rounding down for positive numbers
and rounding up for negative numbers. ndiv
consistently rounds down.
These have the same name as their test
/[
option equivalents. Unlike
with test
, the arguments are shell integer arith expressions, which can be
anything from simple numbers to complex expressions. As with $(( ))
,
variable names are expanded to their values even without the $
.
Function: Returns successfully if:
eq <expr> <expr> the two expressions evaluate to the same number
ne <expr> <expr> the two expressions evaluate to different numbers
lt <expr> <expr> the 1st expr evaluates to a smaller number than the 2nd
le <expr> <expr> the 1st expr eval's to smaller than or equal to the 2nd
gt <expr> <expr> the 1st expr evaluates to a greater number than the 2nd
ge <expr> <expr> the 1st expr eval's to greater than or equal to the 2nd
mapr
(map records): Read delimited records from the standard input, invoking
a callback command with each input record as an argument and with up to
quantum arguments at a time. By default, an input record is one line of text.
Usage: mapr
[ -d
delimiter | -D
] [ -n
count ] [ -s count ]
[ -c quantum ] callback [ argument ... ]
Options:
-d
delimiter: Use the single character delimiter to delimit input records, instead of the newline character.-P
: Paragraph mode. Input records are delimited by sequences consisting of a newline plus one or more blank lines, and leading or trailing blank lines will not result in empty records at the beginning or end of the input. Cannot be used together with -d.-n
count: Pass at most count records as arguments to callback. If count is 0, all records are passed.-s
count: Skip and discard the first count records read.-c
quantum: Specify the number of records read between each invocation of callback. If -c is not supplied, the default quantum is 5000.
Arguments:
- callback: Call the callback command with the collected arguments each
time QUANTUM lines are read. The callback command may be a shell function or
any other kind of command, and is executed from the same shell environment
that invoked
mapr
. It is a fatal error for the callback command to exit with a status > 0. - argument: If there are extra arguments supplied on the mapr command line, they will be added before the collected arguments on each invocation on the callback command.
mapr
was inspired by the bash 4.x builtin command mapfile
a.k.a.
readarray
, and uses similar options, but there are important differences.
mapr
does not support assigning records directly to an array (because the POSIX shell language does not have arrays). Instead, all handling is done through the callback command.mapr
passes all the records as arguments to the callback command.- The callback command is not evaluated from an option-argument but taken
directly from the non-option argument(s) to the
mapr
command. - If the callback command exits unsuccessfully (i.e. with status > 0), this is treated as a fatal error.
- The record separator itself is never included in the arguments passed
to the callback command (so there is no
-t
option to remove it). mapr
supports paragraph mode.mapr
is implemented as a shell function andawk
script.
Defines a new setlocal
...endlocal
shell code block construct with
arbitrary local variables and arbitrary local shell options, as well as
safe field splitting and pathname expansion operators.
Usage: setlocal
[ localitem ... ]
[ [ --split
| --split=
characters ] [ --glob
] --
[ word ... ] ]
;
do
commands ;
endlocal
The commands are executed with the specified settings applied locally to
the setlocal
...endlocal
block.
Each localitem can be:
- A variable name with or without a
=
immediately followed by a value. This renders that variable local to the block, initially either unsetting it or assigning the value, which may be empty. - A shell option letter immediately preceded by a
-
or+
sign. This locally turns that shell option on or off, respectively. This follows the counterintuitive syntax ofset
. Long-form shell options like-o
optionname and+o
optionname are also supported. It depends on the shell what options are supported. Specifying a nonexistent option is a fatal error. Usethisshellhas
to check for a non-POSIX option's existence on the current shell before using it.
The return
command exits the block, causing the global variables and
settings to be restored and resuming execution at the point immmediately
following endlocal
. This is like a shell function. In fact, internally,
setlocal
blocks are one-time shell functions that use
the stack
to save and restore variables and settings. Like any shell
function, a setlocal
block exits with the exit status of the last command
executed within it or, with the status passed on by or given as an argument to
return
.
The break
and continue
commands, when not used within a loop within the
block, also exit the block, but always with exit status 0. It's preferred to
use return
instead. Note that setlocal
creates a new loop context and
you cannot use break
or continue
to resume or break from enclosing loops
outside the setlocal
block. (Shells with
QRK_BCDANGER do in fact allow this, preventing
endlocal
from restoring the global settings! Shells without this quirk
automatically protect against this.)
Within the block, the positional parameters ($@
, $1
, etc.) are always
local. By default, a copy is inherited from outside the block. Any changes to
the positional parameters made within the block will be discarded upon
exiting it.
However, if a --
is present, the set of words after --
becomes the
positional parameters instead, after being modified by the --split
or
--glob
operators if present. The --split
operator subjects the words
to default field splitting, whereas --split=
string subjects them to
field splitting based on the characters given in string. The --glob
operator subjects them to pathname expansion. These operators do not
enable field splitting or pathname expansion within the block itself, but
only subject the words after the --
to them. If field splitting and
globbing are disabled globally, this provides a
safe way to perform field splitting or globbing without actually enabling
them for any code. To illustrate this advantage, note the difference:
# Field splitting is enabled for all unquoted expansions within the
# setlocal block, which may be unsafe, so must quote "$foo" and "$bar".
setlocal dir IFS=':'; do
for dir in $PATH; do
somestuff "$foo" "$bar"
done
endlocal
# The value of PATH is split at ':' and assigned to the positional
# parameters, without enabling field splitting within the setlocal block.
setlocal dir --split=':' -- $PATH; do
for dir do
somestuff $foo $bar
done
endlocal
Important: The --split
and --glob
operators are designed to be
used along with safe mode. If they are used in
traditional mode, i.e. with field splitting and/or pathname expansion
globally active, you must make sure the words after the --
are
properly quoted as with any other command, otherwise you will have
unexpected duplicate splitting or pathname expansion.
Other usage notes:
- Although the
setlocal
declaration ends with; do
as in awhile
oruntil
loop, the code block is terminated withendlocal
and not withdone
. Terminating it withdone
results in a misleading shell syntax error (end of file, or missing}
), a side effect of howsetlocal
is implemented. setlocal
blocks do not mix well withLOCAL
(shell-native functionality for local variables), especially not on shells withQRK_LOCALUNS
orQRK_LOCALUNS2
. Use one or the other, but not both.- For maximum compatibility with shell bugs (particularly
BUG_FNSUBSH
on ksh93, and an alias parsing oddity on mksh [up to R54 2016/11/11] that triggers a spurious syntax error),setlocal
blocks should not be used within subshells, including command substitution subshells. There is usually not much point to this anyway; the point ofsetlocal
is to have certain settings local and keep the rest global, all without the performance hit of forking a subshell process. (Forking new subshells within asetlocal
block is fine.) - A note of caution concerning loop constructs: Care should be taken not to
use
break
andcontinue
in ways that would cause execution to continue outside thesetlocal
block. Some shells do not allowbreak
andcontinue
to break out of a shell function (including the internal one-time shell function employed by setlocal), so thankfully this fails on those shells. But on others this succeeds, so global settings are not restored, wreaking havoc on the rest of your program. One way to avoid the problem is to envelop the entire loop in asetlocal
block. Another is to exit the internal shell function usingreturn 1
and then add|| break
or|| continue
immediately afterendlocal
. - zsh programmers may recognise
setlocal
as pretty much the equivalent of zsh's anonymous functions -- functionality that is hereby brought to all POSIX shells, albeit with a rather different syntax.
String manipulation functions.
trim
: strip whitespace (or other characters) from the beginning and end of
a variable's value.
replacein
: Replace leading, -t
railing or -a
ll occurrences of a string by
another string in a variable.
append
and prepend
: Append or prepend zero or more strings to a
variable, separated by a string of zero or more characters, avoiding the
hairy problem of dangling separators. Optionally shell-quote each string
before appending or prepending.
Some very common external commands ought to be standardised, but aren't. For
instance, the which
and readlink
commands have incompatible options on
various GNU and BSD variants and may be absent on other Unix-like systems.
This module provides a complete re-implementation of such basic utilities
written as modernish shell functions. Scripts that use the modernish version
of these utilities can expect to be fully cross-platform. They also have
various enhancements over the GNU and BSD originals.
readlink
: Read the target of a symbolic link. Robustly handles weird
filenames such as those containing newline characters. Stores result in the
$REPLY variable and optionally writes it on standard output. Optionally
canonicalises each path, following all symlinks encountered (for this mode,
all but the last component must exist). Optionally shell-quote each item of
output for later parsing by the shell, separating multiple items with spaces
instead of newlines.
which
: Outputs, and/or stores in the REPLY
variable, either the first
available directory path to each given command, or all available paths,
according to the current $PATH
or the system default path. Exits
successfully if at least one path was found for each command, or
unsuccessfully if none were found for any given command.
Usage: which
[ -[apqsnQ1]
] [ -P
number ] program [ program ... ]
-a
: List all executables found, not just the first one for each argument.-p
: Search the system default path, not the current$PATH
. This is the minimal path, specified by POSIX, that is guaranteed to find all the standard utilities.-q
: Be quiet: suppress all warnings.-s
: Silent operation: don't write output, only store it in theREPLY
variable. Suppress warnings except, if you runwhich -s
in a subshell, the warning that theREPLY
variable will not survive the subshell.-n
: When writing to standard output, do not write a final newline.-Q
: Shell-quote each unit of output. Separate by spaces instead of newlines. This generates a list of arguments in shell syntax, guaranteed to be suitable for safe parsing by the shell, even if the resulting pathnames should contain strange characters such as spaces or newlines and other control characters.-1
(one): Output the results for at most one of the arguments in descending order of preference: once a search succeeds, ignore the rest. Suppress warnings except a subshell warning for-s
. This is useful for finding a command that can exist under several names, for example, in combination withharden
:
harden -P -f tar $(which -1 gnutar gtar tar)
This option modifies which's exit status behaviour:which -1
returns successfully if any match was found.-P
: Strip the indicated number of pathname elements from the output, starting from the right.-P1
: strip/program
;-P2
: strip/*/program
, etc. This is useful for determining the installation root directory for an installed package.
A cross-platform shell implementation of 'mktemp' that aims to be just as
safe as native mktemp
(1) implementations, while avoiding the problem of
having various mutually incompatible versions and adding several unique
features of its own.
Creates one or more unique temporary files, directories or named pipes, atomically (i.e. avoiding race conditions) and with safe permissions. The path name(s) are stored in $REPLY and optionally written to stdout.
Usage: mktemp
[ -dFsQCt
] [ template ... ]
-d
: Create a directory instead of a regular file.-F
: Create a FIFO (named pipe) instead of a regular file.-s
: Silent. Store output in$REPLY
, don't write any output or message.-Q
: Shell-quote each unit of output. Separate by spaces, not newlines.-C
: Automated cleanup. Pushes a trap to remove the files on exit. On an interactive shell, that's all this option does. On a non-interactive shell, the following applies: Clean up on receiving SIGPIPE and SIGTERM as well. On receiving SIGINT, clean up if the option was given at least twice, otherwise notify the user of files left. On the invocation ofdie
, clean up if the option was given at least three times, otherwise notify the user of files left.-t
: Prefix one temporary files directory to all the templates:$TMPDIR/
ifTMPDIR
is set,/tmp/
otherwise. The templates may not contain any slashes. If the template has neither any trailingX
es nor a trailing dot, a dot is added before the random suffix.
The template defaults to /tmp/temp.
. An suffix of random shell-safe ASCII
characters is added to the template to create the file. For compatibility with
other mktemp
implementations, any optional trailing X
characters in the
template are removed. The length of the suffix will be equal to the amount of
X
es removed, or 10, whichever is more. The longer the random suffix, the
higher the security of using mktemp
in a shared directory such as tmp
.
Since /tmp
is a world-writable directory shared by other users, for best
security it is recommended to create a private subdirectory using mktemp -d
and work within that.
Option -C
cannot be used while invoking mktemp
in a subshell, such as
in a command substitution. Modernish will detect this and treat it as a
fatal error. The reason is that a typical command substitution like
tmpfile=$(mktemp -C)
is incompatible with auto-cleanup, as the cleanup EXIT trap would be
triggered not upon exiting the program but upon exiting the command
substitution subshell that just ran mktemp
, thereby immediately undoing
the creation of the file. Instead, do something like:
mktemp -sC; tmpfile=$REPLY
A cross-platform implementation of seq
that is more powerful and versatile
than native GNU and BSD seq
(1) implementations. The core is written in
bc
, the POSIX arbitrary-presision calculator language. That means this
seq
inherits the capacity to handle numbers with a precision and size only
limited by computer memory, as well as the ability to handle input numbers
in any base from 1 to 16 and produce output in any base 1 and up.
Usage: seq
[ -w
] [ -f
format ] [ -s
string ] [ -S
scale ]
[ -B
base ] [ -b
base ] [ first [ incr ] ] last
seq
prints a sequence of arbitrary-precision floating point numbers, one
per line, from first (default 1), to as near last as possible, in increments of
incr (default 1). If first is larger than last, the default incr is -1.
-w
: Equalise width by padding with leading zeros. The longest of the first, incr or last arguments is taken as the length that each output number should be padded to.-f
:printf
-style floating-point format. The format string is passed on (with an added\n
) toawk
's builtinprintf
function. Because of that, the-f
option can only be used if the output base is 10. Note thatawk
's floating point precision is limited, so very large or long numbers will be rounded.-s
: Use string to separate numbers. Default: newline. The terminator character remains a newline in any case (which is like GNUseq
and differs from BSDseq
).-S
: Explicitly set the scale (number of digits after decimal point). Defaults to the largest number of digits after the decimal point among the first, incr or last arguments.-B
: Set input and output base from 1 to 16. Defaults to 10.-b
: Set arbitrary output base from 1. Defaults to input base. See thebc
(1) manual for more infromation on the output format for bases greater than 16.
The -S
, -B
and -b
options take shell integer numbers as operands. This
means a leading 0X
or 0x
denotes a hexadecimal number and (except on
shells with BUG_NOOCTAL) a leading 0
denotes an octal numnber.
For portability reasons, modernish seq
always uses a dot (.) for the
floating point, never a comma, regardless of the system locale. This applies
both to command arguments and to output.
The -w
, -f
and -s
options are inspired by GNU and BSD seq
, mostly
emulating GNU where they differ. The -S
, -B
and -b
options are
modernish enhancements based on bc
(1) functionality.
rev
copies the specified files to the standard output, reversing the order
of characters in every line. If no files are specified, the standard input
is read.
Usage: like rev
on Linux and BSD, which is like cat
except that -
is
a filename and does not denote standard input. No options are supported.
Functions for working with directories. So far I have:
traverse
is a fully cross-platform, robust replacement for find
without
the snags of the latter. It is not line oriented but handles all data
internally in the shell. Any weird characters in file names (including
whitespace and even newlines) "just work", provided either
use safe
is active or shell expansions are
properly quoted. This avoids many hairy
common pitfalls with find
while remaining compatible with all POSIX systems.
traverse
recursively walks through a directory, executing a command for
each file and subdirectory found. That command is usually a handler shell
function in your program.
Unlike find
, which is so smart its command line options are practically
their own programming language, traverse
is dumb: it has minimal
functionality of its own. However, with a shell function as the command,
any functionality of 'find' and anything else can be programmed in the
shell language. Flexibility is unlimited. The install.sh
script that comes
with modernish provides a good example of its practical use. See also the
traverse-test
example program.
Usage: traverse
[ -d
] [ -F
] [ -X
] directory command
traverse
calls command, once for each file found within the directory,
with one parameter containing the full pathname relative to the directory.
Any directories found within are automatically entered and traversed
recursively unless the command exits with status 1. Symlinks to
directories are not followed.
find
's -prune
functionality is implemented by testing the command's exit
status. If the command indicated exits with status 1 for a directory, this
means: do not traverse the directory in question. For other types of files,
exit status 1 is the same as 0 (success). Exit status 2 means: stop the
execution of traverse
and resume program execution. An exit status greater
than 2 indicates system failure and causes the program to abort.
find
's -depth
functionality is implemented using the -d
option. By
default, traverse
handles directories first, before their contents. The
-d
option causes depth-first traversal, so all entries in a directory will
be acted on before the directory itself. This applies recursively to
subdirectories. That means depth-first traversal is incompatible with
pruning, so returning status 1 for directories will have no effect.
find's -xdev
functionality is implemented using the -F
option. If this
is given, traverse
will not descend into directories that are on another
file system than that of the directory given in the argument.
xargs
-like functionality is implemented using the -X
option. As many
items as possible are saved up before being passed to the command all at
once. This is also incompatible with pruning. Unlike xargs
, the command is
only executed if at least one item was found for it to handle.
countfiles
: Count the files in a directory using nothing but shell
functionality, so without external commands. (It's amazing how many pitfalls
this has, so a library function is needed to do it robustly.)
Usage: countfiles
[ -s
] directory [ globpattern ... ]
Count the number of files in a directory, storing the number in REPLY
and (unless -s
is given) printing it to standard output.
If any globpatterns are given, only count the files matching them.
Utilities for working with the terminal.
readkey
: read a single character from the keyboard without echoing back to
the terminal. Buffering is done so that multiple waiting characters are read
one at a time.
Usage: readkey
[ -E
ERE ] [ -t
timeout ] [ -r
] [ varname ]
-E
: Only accept characters that match the extended regular expression
ERE (the type of RE used by grep -E
/egrep
). readkey
will silently
ignore input not matching the ERE and wait for input matching it.
-t
: Specify a timeout in seconds (one significant digit after the
decimal point). After the timeout expires, no character is read and
readkey
returns status 1.
-r
: Raw mode. Disables INTR (Ctrl+C), QUIT, and SUSP (Ctrl+Z) processing
as well as translation of carriage return (13) to linefeed (10).
The character read is stored into the variable referenced by varname,
which defaults to REPLY
if not specified.
Adds a --long
option to the getopts built-in for parsing GNU-style long
options. (Does not currently work in ash derivatives because getopts
has a function-local state in those shells. The only way out is to
re-implement getopts
completely in shell code instead of building on
the built-in. This is on the TODO list.)
Parsing of command line options for shell functions is a hairy problem.
Using getopts
in shell functions is problematic at best, and manually
written parsers are very hard to do right. That's why this module provides
generateoptionparser
, a command to generate an option parser: it takes
options specifying what variable names to use and what your function should
support, and outputs code to parse options for your shell function. Options
can be specified to require or not take arguments. Combining/stacking
options and arguments in the traditional UNIX manner is supported.
Only short (one-character) options are supported. Each option gets a corresponding variable with a name with a specified prefix, ending in the option character (hence, only option characters that are valid in variables are supported, namely, the ASCII characters A-Z, a-z, 0-9 and the underscore). If the option was not specified on the command line, the variable is set, otherwise it is set to the empty value, or, if the option requires an argument, the variable will contain that argument.
A C-style for loop akin to for (( ))
in bash/ksh/zsh, but unfortunately
not with the same syntax. For example, to count from 1 to 10:
cfor 'i=1' 'i<=10' 'i+=1'; do
echo "$i"
done
(Note that ++i
and i++
can only be used on shells with ARITHPP,
but i+=1
or i=i+1
can be used on all POSIX-compliant shells.)
A C-style for loop with arbitrary shell commands instead of arithmetic expressions. For example, to count from 1 to 10 with traditional shell commands:
sfor 'i=1' '[ "$i" -le 10 ]' 'i=$((i+1))'; do
print "$i"
done
or, with modernish commands:
sfor 'i=1' 'le i 10' 'inc i'; do
print "$i"
done
The shell lacks a very simple and basic loop construct, so this module
provides for an old-fashioned MS BASIC-style for
loop, renamed a with
loop because we can't overload the reserved shell keyword for
. Integer
arithmetic only. Usage:
with <varname>=<value> to <limit> [ step <increment> ]; do
# some commands
done
To count from 1 to 10:
with i=1 to 10; do
print "$i"
done
The value for step
defaults to 1 if limit is equal to or greater
than value, and to -1 if limit is less than value. The latter is
a slight enhancement over the original BASIC for
construct. So
counting backwards is as simple as with i=10 to 1; do
(etc).
A complete and nearly accurate reimplementation of the select
loop from
ksh, zsh and bash for POSIX shells lacking it. Modernish scripts running
on any POSIX shell can now easily use interactive menus.
(All the new loop constructs have one bug in common: as they start with
an alias that expands to two commands, you can't pipe a command's output
directly into such a loop. You have to enclose it in {
...}
as a
workaround. I have not found a way around this limitation that doesn't
involve giving up the familiar do
...done
syntax.)
This is a list of shell capabilities and bugs that modernish tests for, so
that both modernish itself and scripts can easily query the results of these
tests. The all-caps IDs below are all usable with the thisshellhas
function. This makes it easy for a cross-platform modernish script to write
optimisations taking advantage of certain non-standard shell features,
falling back to a standard method on shells without these features. On the
other hand, if universal compatibility is not a concern for your script, it
is just as easy to require certain features and exit with an error message
if they are not present, or to refuse shells with certain known bugs.
Most feature/quirk/bug tests have their own little test script in the
libexec/modernish/cap
directory. These tests are executed on demand, the
first time the capability or bug in question is queried using
thisshellhas
. An ID in ITALICS
denotes an ID for a "builtin" test,
which is always tested for at startup and doesn't have its own test script
file.
Non-standard shell capabilities currently tested for are:
LEPIPEMAIN
: execute last element of a pipe in the main shell, so that things like somecommand| read
somevariable work. (zsh, AT&T ksh, bash 4.2+)RANDOM
: the$RANDOM
pseudorandom generator.LINENO
: the$LINENO
variable contains the current shell script line number.LOCAL
: function-local variables, either using thelocal
keyword, or by aliasinglocal
totypeset
(mksh, yash).KSH88FUNC
: define ksh88-style shell functions with the 'function' keyword, supporting dynamically scoped local variables with the 'typeset' builtin. (mksh, bash, zsh, yash, et al)KSH93FUNC
: the same, but with static scoping for local variables. (ksh93 only) See Q28 at the ksh93 FAQ for an explanation of the difference.ARITHPP
: support for the++
and--
unary operators in shell arithmetic.ARITHCMD
: standalone arithmetic evaluation using a command like((
expression))
.ARITHFOR
: ksh93/C-style arithmetic 'for' loops of the formfor ((
exp1;
exp2;
exp3)) do
commands; done
.CESCQUOT
: Quoting with C-style escapes, like$'\n'
for newline.ADDASSIGN
: Add a string to a variable using additive assignment, e.g. VAR+=
stringPSREPLACE
: Search and replace strings in variables using special parameter substitutions with a syntax vaguely resembling sed.ROFUNC
: Set functions to read-only withreadonly -f
. (bash, yash)DOTARG
: Dot scripts support arguments.HERESTR
: Here-strings, an abbreviated kind of here-document.TESTO
: Thetest
/[
builtin supports the-o
unary operator to check if a shell option is set.PRINTFV
: The shell'sprintf
builtin has the-v
option to print to a variable, which avoids forking a command substitution subshell.ANONFUNC
: zsh anonymous functions (basically the native zsh equivalent of modernish's var/setlocal module)KSHARRAY
: ksh88-style arrays. Supported on bash, zsh (underemulate sh
), mksh, pdksh and ksh93.KSHARASGN
: ksh93-style mass array assignment in the style ofarray=(one two three)
. Supported on the same shells as KSHARRAY except pdksh.
Shell quirks currently tested for are:
QRK_IFSFINAL
: in field splitting, a final non-whitespace IFS delimiter character is counted as an empty field (yash < 2.42, zsh, pdksh). This is a QRK (quirk), not a BUG, because POSIX is ambiguous on this.QRK_32BIT
: mksh: the shell only has 32-bit arithmetics. Since every modern system these days supports 64-bit long integers even on 32-bit kernels, we can now count this as a quirk.QRK_ARITHWHSP
: In yash and FreeBSD /bin/sh, trailing whitespace from variables is not trimmed in arithmetic expansion, causing the shell to exit with an 'invalid number' error. POSIX is silent on the issue. The modernishisint
function (to determine if a string is a valid integer number in shell syntax) isQRK_ARITHWHSP
compatible, tolerating only leading whitespace.QRK_BCDANGER
:break
andcontinue
can affect non-enclosing loops, even across shell function barriers (zsh, Busybox ash; older versions of bash, dash and yash). (This is especially dangerous when using var/setlocal which internally uses a temporary shell function to try to protect against breaking out of the block without restoring global parameters and settings.)QRK_EMPTPPFLD
: Unquoted$@
and$*
do not discard empty fields. POSIX says for both unquoted$@
and unquoted$*
that empty positional parameters may be discarded from the expansion. AFAIK, just one shell (yash) doesn't.QRK_EMPTPPWRD
: POSIX says that empty"$@"
generates zero fields but empty''
or""
or"$emptyvariable"
generates one empty field. But it leaves unspecified whether something like"$@$emptyvariable"
generates zero fields or one field. Zsh, pdksh/mksh and (d)ash generate one field, as seems logical. But bash, AT&T ksh and yash generate zero fields, which we consider a quirk. (See also BUG_PP_01)QRK_EVALNOOPT
:eval
does not parse options, not even--
, which makes it incompatible with other shells: on the one hand, (d)ash does not accept
eval -- "$command"
whereas on other shells this is necessary if the command starts with a-
, or the command would be interpreted as an option toeval
. A simple workaround is to prefix arbitrary commands with a space. Both situations are POSIX compliant, but since they are incompatible without a workaround,the minority situation is labeled here as a QuiRK.QRK_EXECFNBI
: In pdksh and zsh,exec
looks up shell functions and builtins before external commands, and if it finds one it does the equivalent of running the function or builtin followed byexit
. This is probably a bug in POSIX terms;exec
is supposed to launch a program that overlays the current shell, implying the program launched byexec
is always external to the shell. However, since the POSIX language is rather vague and possibly incorrect, this is labeled as a shell quirk instead of a shell bug.BUG_HDPARQUOT
: Double quotes within certain parameter substitutions in here-documents aren't removed (FreeBSD sh; bosh). For instance, ifvar
is set,${var+"x"}
in a here-document yields"x"
, notx
. POSIX considers it undefined to use double quotes there, so they should be avoided for a script to be fully POSIX compatible. (Note this quirk does not apply for substitutions that remove pattens, such as${var#"$x"}
and${var%"$x"}
; those are defined by POSIX and double quotes are fine to use.) (Note 2: single quotes produce widely varying behaviour and should never be used within any form of parameter substitution in a here-document.)QRK_LOCALINH
: On a shell with LOCAL, local variables, when declared without assigning a value, inherit the state of their global namesake, if any. (dash, FreeBSD sh)QRK_LOCALSET
: On a shell with LOCAL, local variables are immediately set to the empty value upon being declared, instead of being initially without a value. (zsh)QRK_LOCALSET2
: LikeQRK_LOCALSET
, but only if the variable by the same name in the global/parent scope is unset. If the global variable is set, then the local variable starts out unset. (bash 2 and 3)QRK_LOCALUNS
: On a shell with LOCAL, local variables lose their local status when unset. Since the variable name reverts to global, this means thatunset
will not necessarily unset the variable! (yash, pdksh/mksh. Note: this is actually a behaviour oftypeset
, to which modernish aliaseslocal
on these shells.)QRK_LOCALUNS2
: This is a more treacherous version ofQRK_LOCALUNS
that is unique to bash. Theunset
command works as expected when used on a local variable in the same scope that variable was declared in, however, it makes local variables global again if they are unset in a subscope of that local scope, such as a function called by the function where it is local. (Note: sinceQRK_LOCALUNS2
is a special case ofQRK_LOCALUNS
, modernish will not detect both.)QRK_UNSETF
: If 'unset' is invoked without any option flag (-v or -f), and no variable by the given name exists but a function does, the shell unsets the function. (bash)
Non-fatal shell bugs currently tested for are:
-
BUG_ALSUBSH
: Aliases defined within subshells leak upwards to the main shell. (Bug found in older versions of ksh93.) -
BUG_APPENDC
: Whenset -C
(noclobber
) is active, "appending" to a nonexistent file with>>
throws an error rather than creating the file. (zsh < 5.1) This is a bug makinguse safe
less convenient to work with, as this sets the-C
(-o noclobber
) option to reduce accidental overwriting of files. Thesafe
module requires an explicit override to tolerate this bug. -
BUG_ARITHINIT
: In dash 0.5.9.1, using unset or empty variables in arithmetic expressions causes the shell to error out with an "Illegal number" error. Instead, according to POSIX, it should take them as a value of zero. Yash (at least up to 2.44) also has a variant of this bug: it is only triggered in a simple arithmetic expression containing a single variable name without operators. The bug causes yash to exit silently with status 2. -
BUG_ARITHTYPE
: In zsh, arithmetic assignments (usinglet
,$(( ))
, etc.) on unset variables assign a numerical/arithmetic type to a variable, causing subsequent normal variable assignments to be interpreted as arithmetic expressions and fail if they are not valid as such. -
BUG_BRACQUOT
: shell quoting within bracket patterns has no effect (zsh < 5.3; ksh93) This bug means the-
retains it special meaning of 'character range', and an initial!
(and, on some shells,^
) retains the meaning of negation, even in quoted strings within bracket patterns, including quoted variables. -
BUG_CASECC01
: glob patterns as in 'case' cannot match an escaped^A
($CC01
) control character. Found on: bash 2.05b -
BUG_CASESTAT
: The 'case' conditional construct prematurely clobbers the exit status$?
. (found in zsh < 5.3, Busybox ash <= 1.25.0, dash < 0.5.9.1) -
BUG_CMDOPTEXP
: thecommand
builtin does not recognise options if they result from expansions. For instance, you cannot conditionally store-p
in a variable likedefaultpath
and then docommand $defaultpath someCommand
. (found in zsh < 5.3) -
BUG_CMDPV
:command -pv
does not find builtins ({pd,m}ksh), does not accept the -p and -v options together (zsh < 5.3) or ignores the '-p' option altogether (bash 3.2); in any case, it's not usable to find commands in the default system PATH. -
BUG_CMDSPASGN
: preceding a special builtin with 'command' does not stop preceding invocation-local variable assignments from becoming global. (AT&T ksh, 2010-ish versions) -
BUG_CMDSPEXIT
: preceding a special builtin with 'command' does not stop it from exiting the shell if the builtin encounters error. (zsh < 5.2; mksh < R50e) -
BUG_CMDVRESV
: 'command -v' does not find reserved words such as "if". (pdksh, mksh). This necessitates a workaround version of thisshellhas(). -
BUG_CNONASCII
: the modernish functionstoupper
andtolower
cannot convert non-ASCII letters to upper or lower case -- e.g. accented Latin letters, Greek, cyrillic. (Note: modernish falls back to the externaltr
,awk
,gawk
or GNUsed
command if the shell can't convert non-ASCII (or any) characters, so this bug is only detected if none of these external commands can convert them. But if the shell can, then this bug is not detected even if the external commands cannot. The thing to take away from all this is that the result ofthisshellhas BUG_CNONASCII
only applies to the modernishtoupper
andtolower
functions and not to your shell or any external command in particular.) -
BUG_CSCMTQUOT
: unbalanced single and double quotes and backticks in comments within command substitutions cause obscure and hard-to-trace syntax errors later on in the script. (ksh88; pdksh, incl. {Open,Net}BSD ksh; bash 2.05b) -
BUG_CSNHDBKSL
: Backslashes within non-expanding here-documents within command substitutions are incorrectly expanded to perform newline joining, as opposed to left intact. (bash <= 4.4, and pdksh) -
BUG_DOLRCSUB
: parsing problem where, inside a command substitution of the form$(...)
, the sequence$$'...'
is treated as$'...'
(i.e. as a use of CESCQUOT), and$$"..."
as$"..."
(bash-specific translatable string). (Found in bash up to 4.4) -
BUG_EMPTYBRE
is acase
pattern matching bug in zsh < 5.0.8: empty bracket expressions eat subsequent shell grammar, producing unexpected results. This is particularly bad if you want to pass a bracket expression using a variable or parameter, and that variable or parameter could be empty. This means the grammar parsing depends on the contents of the variable! -
BUG_EVALCOBR
:break
andcontinue
do not work if they are withineval
, wrongly causing loop execution to continue. (pdksh; mksh < R55 2017/04/12) -
BUG_FNREDIRP
: I/O redirections on function definitions are forgotten if the function is called as part of a pipeline with at least one|
. (bash 2.05b) -
BUG_FNSUBSH
: Function definitions within subshells (including command substitutions) are ignored if a function by the same name exists in the main shell, so the wrong function is executed.unset -f
is also silently ignored. ksh93 (all current versions as of June 2015) has this bug. -
BUG_HASHVAR
: On zsh,$#var
means the length of$var
- other shells and POSIX require braces, as in${#var}
. This causes interesting bugs when combining$#
, being the number of positional parameters, with other strings. For example, in arithmetics:$(($#-1))
, instead of the number of positional parameters minus one, is interpreted as${#-}
concatenated with1
. So, for zsh compatibility, always use${#}
instead of$#
unless it's stand-alone or followed by a space. -
BUG_IFSGLOBC
: In glob pattern matching (such as incase
and[[
), if a wildcard character is part ofIFS
, it is matched literally instead of as a matching character. This applies to glob characters*
,?
,[
and]
. Since nearly all modernish functions usecase
for argument validation and other purposes, nearly every modernish function breaks on shells with this bug if IFS contains any of these three characters! (Found in bash < 4.4) -
BUG_IFSGLOBP
: In pathname expansion (filename globbing), if a wildcard character is part ofIFS
, it is matched literally instead of as a matching character. This applies to glob characters*
,?
,[
and]
. (Bug found in bash, all versions up to at least 4.4) -
BUG_IFSGLOBS
: in glob pattern matching (as incase
or paramter substitution with#
and%
), ifIFS
starts with?
or*
and the"$*"
parameter expansion inserts any IFS separator characters, those characters are erroneously interpreted as wildcards when quoted "$*" is used as the glob pattern. (AT&T ksh93) -
BUG_IFSISSET
: AT&T ksh93 (recent versions):${IFS+s}
always yields 's' even if IFS is unset. This applies to IFS only. -
BUG_ISSETLOOP
: AT&T ksh93: Expansions like${var+set}
and${var+:nonempty)
remain static when used within afor
,while
oruntil
loop; the expansions don't change along with the state of the variable, so they cannot be used to check whether a variable is set and/or empty within a loop if the state of that variable may change in the course of the loop. -
BUG_KUNSETIFS
: ksh93: Can't unsetIFS
under very specific circumstances.unset -v IFS
is a known POSIX shell idiom to activate default field splitting. With this bug, theunset
builtin silently fails to unset IFS (i.e. fails to activate field splitting) if we're executing aneval
or a trap and a number of specific conditions are met. See BUG_KUNSETIFS.t for more information. -
BUG_LNNOALIAS
: The shell has LINENO, but $LINENO is always expanded to 0 when used within an alias. (pdksh variants, including mksh and oksh) -
BUG_LNNOEVAL
: The shell has LINENO, but $LINENO is always expanded to 0 when used in 'eval'. (pdksh variants, including mksh and oksh) -
BUG_MULTIBIFS
: We're on a UTF-8 locale and the shell supports UTF-8 characters in general (i.e. we don't haveBUG_MULTIBYTE
) -- however, using multibyte characters asIFS
field delimiters still doesn't work. For example,"$*"
joins positional parameters on the first byte of$IFS
instead of the first character. (ksh93, mksh, FreeBSD sh, Busybox ash) -
BUG_MULTIBYTE
: We're in a UTF-8 locale but the shell does not have multi-byte/variable-length character support. (Non-UTF-8 variable-length locales are not yet supported.) Dash is a recent shell with this bug. -
BUG_NOCHCLASS
: POSIX-mandated character[:
classes:]
within bracket[
expressions]
are not supported in glob patterns. (pdksh, mksh, and family) -
BUG_NOOCTAL
: Shell arithmetic does interpret numbers with leading zeroes as octal numbers; these are interpreted as decimal instead, though POSIX specifies octal. (older mksh, 2013-ish versions) -
BUG_NOUNSETEX
: Cannot assign export attribute to variables in an unset state; exporting a variable immediately sets it to the empty value. (zsh < 5.3) -
BUG_NOUNSETRO
: Cannot freeze variables as readonly in an unset state. This bug in zsh < 5.0.8 makes thereadonly
command set them to the empty string instead. -
BUG_OPTNOLOG
: on dash, setting-o nolog
causes$-
to wreak havoc: trying to expand$-
silently aborts parsing of an entire argument, so e.g."one,$-,two"
yields"one,"
. (Same applies to-o debug
.) -
BUG_PARONEARG
: WhenIFS
is empty on bash 3.x and 4.x (i.e. field splitting is off),${1+"$@"}
is counted as a single argument instead of each positional parameter as separate arguments. To avoid this bug, simply use"$@"
instead. (${1+"$@"}
is an obsolete workaround for a fatal shell bug,FTL_UPP
.) -
BUG_PFRPAD
: Negative padding value for strings in theprintf
builtin does not cause blank padding on the right-hand side, but inserts blank padding on the left-hand side as if the value were positive, e.g.printf '[%-4s]' hi
outputs[ hi]
, not[hi ]
. (zsh 5.0.8) -
BUG_PP_01
: POSIX says that empty"$@"
generates zero fields but empty''
or""
or"$emptyvariable"
generates one empty field. This means concatenating"$@"
with one or more other, separately quoted, empty strings (like"$@""$emptyvariable"
) should still produce one empty field. But on bash 3.x, this erroneously produces zero fields. (See also QRK_EMPTPPWRD) -
BUG_PP_02
: LikeBUG_PP_01
, but with unquoted$@
and only with"$emptyvariable"$@
, not$@"$emptyvariable"
. (pdksh) -
BUG_PP_03
: When IFS is unset or empty (zsh 5.3.1) or empty (pdksh), assigningvar=$*
only assigns the first field, failing to join and discarding the rest of the fields. Workaround:var="$*"
(POSIX leavesvar=$@
, etc. undefined, so we don't test for those.) -
BUG_PP_03A
: When IFS is unset, assignments likevar=$*
incorrectly remove leading and trailing spaces (but not tabs or newlines) from the result. Workaround: quote the expansion. Found on: bash 4.3 and 4.4. -
BUG_PP_03B
: When IFS is unset, assignments likevar=${var+$*}
, etc. incorrectly remove leading and trailing spaces (but not tabs or newlines) from the result. Workaround: quote the expansion. Found on: bash 4.3 and 4.4. -
BUG_PP_03C
: WhenIFS
is unset, assigningvar=${var-$*}
only assigns the first field, failing to join and discarding the rest of the fields. (zsh 5.3, 5.3.1) Workaround:var=${var-"$*"}
-
BUG_PP_04
: Assigning the positional parameters to a variable using a conditional assignment within a parameter substitution, such as :${var=$ *}, discards everything but the last field if IFS is empty. (pdksh, mksh) -
BUG_PP_04_S
: When IFS is null (empty), the result of a substitution like${var=$*}
is incorrectly field-split on spaces. The difference with BUG_PP_04 is that the assignment itself succeeds normally. Found on: bash 4.2, 4.3 -
BUG_PP_04A
: Like BUG_PP_03A, but for conditional assignments within parameter substitutions, as in: ${var=$*}
or: ${var:=$*}
. Workaround: quote either$*
within the expansion or the expansion itself. Found on: bash 2.05b through 4.4. -
BUG_PP_04B
: When assigning the positional parameters ($*) to a variable using a conditional assignment within a parameter substitution, e.g.: ${var:=$*}
, the fields are always joined and separated by spaces, regardless of the content or state of IFS. Workaround as in BUG_PP_04A. (bash 2.05b) -
BUG_PP_04C
: In e.g.: ${var:=$*}
, the expansion incorrectly generates multiple fields. POSIX says the expansion (before field splitting) shall generate the result of the assignment, i.e. 1 field. Workaround: same. (mksh R50) -
BUG_PP_05
: POSIX says that empty$@
generates zero fields, but with null IFS, empty unquoted$@
yields one empty field. Found on: dash 0.5.9.1 -
BUG_PP_06
: POSIX says that unquoted$@
initially generates as many fields as there are positional parameters, and then (because$@
is unquoted) each field is split further according toIFS
. With this bug, the latter step is not done. Found on: zsh < 5.3 -
BUG_PP_06A
: POSIX says that unquoted$@
and$*
initially generate as many fields as there are positional parameters, and then (because$@
or$*
is unquoted) each field is split further according toIFS
. With this bug, the latter step is not done ifIFS
is unset (i.e. default split). Found on: zsh < 5.4 -
BUG_PP_07
: unquoted$*
and$@
(including in substitutions like${1+$@}
or${var-$*}
) do not perform default field splitting ifIFS
is unset. Found on: zsh (up to 5.3.1) in sh mode -
BUG_PP_07A
: WhenIFS
is unset, unquoted$*
undergoes word splitting as ifIFS=' '
, and not the expectedIFS=" ${CCt}${CCn}"
. Found on: bash 4.4 -
BUG_PP_08
: WhenIFS
is empty, unquoted$@
and$*
do not generate one field for each positional parameter as expected, but instead join them into a single field without a separator. Found on: yash < 2.44 -
BUG_PP_08B
: WhenIFS
is empty, unquoted$*
within a substitution (e.g.${1+$*}
or${var-$*}
) does not generate one field for each positional parameter as expected, but instead joins them into a single field without a separator. Found on: bash 3 and 4 -
BUG_PP_09
: WhenIFS
is non-empty but does not contain a space, unquoted$*
within a substitution (e.g.${1+$*}
or${var-$*}
) does not generate one field for each positional parameter as expected, but instead joins them into a single field separated by spaces (even though, as said, IFS does not contain a space). Found on: bash 2 -
BUG_PP_10
: WhenIFS
is null (empty), assigningvar=$*
removes any$CC01
(^A) and$CC7F
(DEL) characters. (bash 3, 4) -
BUG_PP_10A
: WhenIFS
is non-empty, assigningvar=$*
prefixes each$CC01
(^A) and$CC7F
(DEL) character with a$CC01
character. (bash 4.4) -
BUG_PSUBBKSL1
: A backslash-escaped}
character within a quoted parameter substitution is not unescaped. (bash 2 & 3, standard dash, Busybox ash) -
BUG_PSUBPAREN
: Parameter substitutions where the word to substitute contains parentheses wrongly cause a "bad substitution" error. (pdksh) -
BUG_PSUBSQUOT
: in pattern matching parameter substitutions (${param#pattern}
,${param%pattern}
,${param##pattern}
and${param%%pattern}
), if the whole parameter substitution is quoted with double quotes, then single quotes in the pattern are not parsed. POSIX says they are to keep their special meaning, so that glob characters may be quoted. For example:x=foobar; echo "${x#'foo'}"
should yieldbar
but with this bug yieldsfoobar
. (dash; Busybox ash) -
BUG_READTWHSP
:read
does not trim trailing IFS whitespace if there is more than one field. (dash 0.5.8) -
BUG_REDIRIO
: the I/O redirection operator<>
(open a file descriptor for both read and write) defaults to opening standard output (i.e. is short for1<>
) instead of defaulting to opening standard input (0<>
) as POSIX specifies. (AT&T ksh93) -
BUG_SELECTEOF
: in a shell-native 'select' loop, the REPLY variable is not cleared if the user presses Ctrl-D to exit the loop. (zsh) -
BUG_SELECTRPL
: in a shell-native 'select' loop, input that is not a menu item is not stored in the REPLY variable as it should be. (mksh R50 2014) -
BUG_TESTERR0
: mksh:test
/[
exits successfully (exit status 0) if an invalid argument is given to an operator. (mksh R52 fixes this) -
BUG_TESTERR1A
: AT&T ksh:test
/[
exits with a non-error 'false' status (1) if an invalid argument is given to an operator. -
BUG_TESTERR1B
: zsh:test
/[
exits with status 1 (false) if there are too few or too many arguments, instead of a status > 1 as it should do. -
BUG_TESTILNUM
: On dash (up to 0.5.8), giving an illegal number totest -t
or[ -t
causes some kind of corruption so the nexttest
/[
invocation fails with an "unexpected operator" error even if it's legit. -
BUG_TESTONEG
: Thetest
/[
builtin supports a-o
unary operator to check if a shell option is set, but it ignores theno
prefix on shell option names, so something like[ -o noclobber ]
gives a false positive. Bug found on yash up to 2.43. (TheTESTO
feature test implicitly checks against this bug and won't detect the feature if the bug is found.) -
BUG_TESTPAREN
: Incorrect exit status oftest -n
/-z
with values(
,)
or!
in zsh 5.0.6 and 5.0.7. This can make scripts that process arbitrary data (e.g. the shellquote function) take the wrong action unless workarounds are implemented or modernish equivalents are used instead. Also, spurious error message with bothtest -n
andtest -z
. -
BUG_TESTRMPAR
: zsh: in binary operators withtest
/[
, if the first argument starts with(
and the last with `)', both the first and the last argument are completely removed, leaving only the operator, and the result of the operation is incorrectly true because the operator is incorrectly parsed as a non-empty string. This applies to any operator.
Warning IDs do not identify any characteristic of the shell, but instead warn about a potentially problematic system condition that was detected at initalisation time.
WRN_NOSIGPIPE
: Modernish has detected that the process that launched the current program has setSIGPIPE
to ignore, an irreversible condition that is in turn inherited by any process started by the current shell, and their subprocesses, and so on. This makes it impossible to detect$SIGPIPESTATUS
; it is set to the special value 99999 which is impossible as an exit status. But it also makes it irrelevant what that status is, because neither the current shell nor any process it spawns is now capable of receivingSIGPIPE
. The-P
option toharden
is also rendered irrelevant. Note that a command such asyes | head -n 10
now never ends; the only wayyes
would ever stop trying to write lines is by receivingSIGPIPE
fromhead
, which is being ignored. Programs that use commands in this fashion should checkif thisshellhas WRN_NOSIGPIPE
and either employ workarounds or refuse to run if so.
Modernish comes with a suite of regression tests to detect bugs in modernish
itself, which can be run using modernish --test
after installation. By
default, it will run all the tests verbosely but without tracing the command
execution. The install.sh
installer will run the suite quietly on the
selected shell before installation.
A few options are available to specify after --test
:
-q
: quieter operation; report expected fails [known shell bugs] and unexpected fails [bugs in modernish]). Add-q
again for quietest operation (report unexpected fails only).-s
: entirely silent operation.-x
: trace each test using the shell'sxtrace
facility. Each trace is stored in a separate file in a specially created temporary directory. By default, the trace is deleted if a test does not produce an unexpected fail. Add-x
again to keep all traces. If any traces were saved, modernish will tell you the location of the temporary directory at the end, otherwise it will silently remove the directory again.
These short options can be combined so, for example,
--test -qxx
is the same as --test -q -x -x
.
Note the difference between these regression tests and the tests listed
above in Appendix A. The latter are tests for
whatever shell is executing modernish: they test for capabilities (features,
quirks, bugs) of the current shell. They are meant to be run via
thisshellhas
and are designed to be
taken advantage of in scripts. On the other hand, these tests run by
modernish --test
are regression tests for modernish itself. It does not
make sense to use these in a script.
New/unknown shell bugs can still cause modernish regression tests to fail, of course. That's why some of the regression tests also check for consistency with the results of the feature/quirk/bug tests: if there is a shell bug in a widespread release version that modernish doesn't know about yet, this in turn is considered to be a bug in modernish, because one of its goals is to know about all the shell bugs in all released shell versions currently seeing significant use.
The testshells.sh
program in share/doc/modernish/examples
can be used to
run the regression test suite on all the shells installed on your system.
You could put it as testshells
in some convenient location in your
$PATH
, and then simply run:
testshells modernish --test
(adding any further options you like -- for instance, you might like to add
-q
to avoid very long terminal output). On first run, testshells
will
generate a list of shells it can find on your system and it will give you a
chance to edit it before proceeding.
EOF