MUMPy is an ANSI M interpreter written in pure Python. It provides both a functional Read-Eval-Print Loop (REPL) for the M language and a routine interpreter, allowing routines to be executed directly from the command shell.
For now, MUMPy is very much an pre-alpha quality product. Many core features
of the M language are not yet implemented. Things such as argumentless DO
commands to extend conditional line scope are not yet functional. Users
interested in an actual functional M interpreter should investigate
FIS GT.M, which is an open source M interpreter that fully conforms to
the ANSI M standard and will probably be a lot faster to boot (after all,
it's written in C).
MUMPy is mostly just a fun learning project for me, though I would eventually like for it to be fully featured.
MUMPy can be installed using pip
:
pip install git+git://github.com/chrisrink10/mumpy@master
To use MUMPy interactively, simply fire it up from the command line: mumpy
.
MUMPy can interpret M source code files (files ending in a *.m
extension) by
typing mumpy -f <NAME>
where <NAME>
is the name of the routine,
excluding the extension. MUMPy will compile a Python module with the same
base name. Users should note that routine base names should match the first
tag (line label, explained below) in the routine file. This means that
M routine names are limited to ASCII characters %a-zA-Z0-9
, where the
first character cannot be numeric 0-9
. Users can read more about
routines in the Routines section below.
Other command line options are available by invoking the --help
parameter.
Users can enter a routine at a certain tag and specify input parameters
now. Use the -f
parameter to specify a routine. Optionally, users can
specify a tag to use with -t
and a list of space-delimited arguments with
the -a
parameter. Summoning the AskQuestion^LEARNM
function given at
the end of this document can be done with the following command:
mumpy -f LEARNM -r -t AskQuestion -a "What time is it?" 10
I will provide a basic MUMPS primer for users who are unfamiliar with its peculiar syntax. It is important to note that in MUMPS, white space outside of string literals is very important. Due to the limitations of the REPL format, the syntactic requirements enforced on routines are relaxed in REPL mode. White space remains important, but there are slightly different syntactic requirements.
MUMPS commands indicate an action for the interpreter to take. These
commands have two forms, the full word and an abbreviated format
(write
and w
, for example). A command is evaluated case insensitive,
so WRITE
is equivalent to w
. MUMPS permits multiple commands on one
line, each separated by a space (well, rather a space should separate the
argument list of the previous command and the new command).
In REPL mode, MUMPy always expects either a command or a comment as the leading input token. In interpreter mode, MUMPy expects a tag (a line label) and a space or a just space. Commands or comments may follow the space in either case. For those who like to read ahead, there is a full routine example given below.
MUMPS commands accept zero or more arguments in a comma delimited list. The argument list should be separated from the command by a single space. The value of the arguments and the number that each command accepts vary by the command and function. The list elements in the argument list should not be separated by any space characters - spaces are, of course, allowed in any string literals in the argument list.
Commands which accept zero arguments should still be followed by one space and, if followed by another command on the same line, would be followed by the space that would normally separate the next command and the previous command's argument list. That is, there should always be two spaces between any two commands (excluding any spaces contained within string literals in an argument list).
Since MUMPS is a programming language, it would be only appropriate to have a "Hello, world!" example:
mumpy > write "Hello, world!"
Hello, world!
Here is the same example but using an argument list instead of a single string literal argument. Using an argument list for a command is merely syntactic sugar for performing the command twice in a row:
mumpy > write "Hello, ","world!"
Hello, world!
mumpy > write "Hello, " write "world!"
Hello, world!
All non-conditional commands (conditional commands are IF
, ELSE
, and
FOR
) permit the caller to affix a post-conditional. This is an expression
which evaluates to a truth value (see Expressions below). If the expression
evaluates True, then the command proceeds with any arguments. If the
expression evaluates as False, then the command will not be executed by
the interpreter. While the post-conditional is a very powerful tool,
callers should be careful to recognize that the scope of the conditional
for the command is just the command. Line-scoped conditionals are
performed with the IF
or ELSE
commands.
mumpy > write:(0) "This will not output."
mumpy > write:(0) "Nor will this." write:(1) "But this will!"
But this will!
mumpy > write:(0) "All arguments ","are affected!"
mumpy > write:(10*4) "This expression evaluates to true!"
This expression evaluates to true!
Strictly speaking, the only data type in MUMPS is the string. MUMPS does handle numeric values as well, though these are really just a specialized case of strings.
MUMPS strings can be evaluated as numbers readily with a very well defined
conversion. The unary plus operator +
can be affixed to any string value
to produce a number from a string. The operation converts any numeric
characters starting from the left (including +
, -
, and .
) into
a number, quitting at the first non-numeric character it finds. Likewise,
certain operations are considered strictly numeric and these operations
will also coerce the value using the same rules.
The conversions can be seen as below:
mumpy > write +"27 dollars"
27
mumpy > write +"I need 27 dollars"
0
mumpy > write "27 dollars"+"12 dollars"
39
mumpy > write +"+---3.5.5"
-3.5
There is no boolean data type in MUMPS, but certain operations evaluate
to so called 'truth-valued' expressions. In reality, these expressions
evaluate to either 0
(False) or 1
(True). Note that this does still
cause the M interpreter to perform the numeric cast described above on
string values. Any numeric value which is not 0 (including negative
values) is evaluated as true.
The M language implements many of the same operators that you are familiar
with in other languages. Unlike other languages, however, you are not
permitted to use any language elements outside of the context of a command.
Thus it would not be legal for the (otherwise valid) expression 1+1
outside
of some command. You could make that expression legal if you were to write
it like write 1+1
, which would produce the value 2
. Likewise, you could
assign the result of that expression to a variable with set x=1+1
. There
are many other legal places where programmers can include expressions.
Expressions can be arbitrarily complex, but programmers should note that all binary operators operate at the same level of precedence. In practice, this means that all binary operations evaluate in strict left-to-right order. Since this differs from most common programming languages and typical arithmetic computations, this can be quite inconvenient. However, programmers can modify the order of precedence by surrounding expressions with parentheses. In the example below, we demonstrate the unexpected default output and the easy modification to force standard order of operations.
mumpy > write 1+2*4
12
mumpy > write 1+(2*4)
9
Unary operators (such as the unary plus shown earlier) operate at a higher level of precedence than the binary operators. These operators always associate right. Given their higher precedence, programmers should not need to make any special provision to force these operators to act as they would normally expect.
MUMPS provides the following binary operators standard. Note that if the operand is noted as strictly numeric, this means that both operands will be casted to numbers as described in the previous section. Likewise, operations which are strictly string will not perform any numeric evaluation of the operands, even if both are numeric. Truth-valued operations are the operations which produce truth-valued results, as described above.
- Strictly numeric operations (return result of operation)
- Addition:
+
- Subtraction:
-
- Multiplication:
*
- Division:
/
- Integer division:
\
- Modulus:
#
- Exponentiation:
**
- Addition:
- Strictly numeric comparisons (return truth-valued result)
- Greater than:
>
- Not greater than:
'>
- Less than:
<
- Not less than:
'<
- Greater than:
- Strictly string operations (return result of operation)
- Concatenation:
_
- Concatenation:
- String comparisons (return truth-valued result)
- Follows (left operand follows right in binary byte order):
]
- Sorts after (left operand sorts after right in collation order):
]]
- Contains (left operand contains right operand):
[
- Pattern match (left operand matches pattern in right operand):
?
- Follows (left operand follows right in binary byte order):
- Truth-valued operations (return truth-valued result)
- And:
&
- Not and:
'&
- Or:
!
- Not or:
'!
- And:
The MUMPS unary operators are +
and -
, producing numeric values of either
positive or negative (or zero) value from any value (casting strings as
seen above). There is also a truth-valued '
(Not) operator which will
negate the numeric value of an expression.
The only operator left out of the above list is the equals operator =
and
it's negation '=
. Equality comparison does not perform any strict casting
as many of the other operators do. A comparison between two strings will
test for string equality. A comparison between two numbers will test for
numeric equality. A comparison between a string and number will test for
string equality. This can lead to some perhaps unintuitive results:
mumpy > write "0.1"=.100
0
mumpy > write 1="01"
0
mumpy > write 1=+"01"
1
MUMPS variables come in one of two flavors, local and global. Local variables will be familiar to users of nearly every other programming language. Global variables may sound familiar, but they have a somewhat different implementation and meaning in MUMPS than in other languages.
In other languages, a global variable is one which has global scope in the current process - meaning that every execution unit of the program can access and modify that value. In MUMPS, it is true that every routine in a process (and indeed on the entire system) can share access to these globals. This is because MUMPS global variables are actually persistent. MUMPS stores globals on the hard-drive of the current operating environment, meaning that these values survive the lifetime of the current process. MUMPS provides the facilities to lock and unlock global variable nodes to permit safe concurrent usage.
In their most simple case, these variables act as scalar values. However, both local and global variables act as multi-dimensional sparse arrays without' any special handling by the programmer. Indeed this is one of the defining features of MUMPS. The array nodes may be strings or numbers:
mumpy > set person=45
mumpy > set person("name")="Chris Smith"
mumpy > set person("name","first")="Chris"
mumpy > set person("name","last")="Smith"
mumpy > set person("child",1)="Celia Smith"
mumpy > set person("child",2)="Cameron Smith"
MUMPS stores the given array nodes in sorted order and provides the $ORDER
intrinsic function to allow programmers to step through array nodes:
mumpy > set next=$ORDER(person("child",""))
mumpy > write person("child",next)
"Celia Smith"
mumpy > set next=$ORDER(person("child",next))
mumpy > write person("child",next)
"Cameron Smith"
The examples above show operations on local variables. The same operations
can easily be performed on global variables merely by prefixing the name of
the variable with a ^
caret character; ^person
is a global variable,
whereas person
is a local variable.
Programmers in MUMPS should also be mindful of the rather simplistic and
loose scoping rules that exist in MUMPS. MUMPS does not enforce strict
scoping rules. If a function or subroutine references a variable name not
explicitly defined on the current stack frame, MUMPS will search back through
the stack in reverse order and provide the caller with the first instance
of a variable with the given name. Programmers may use the NEW
command
in a stack frame to explicitly declare a variable with the given name on
the current stack frame. This variable will be deleted from the stack once
the function or subroutine completes and MUMPS unwinds its stack frame.
By default, the REPL and the routine interpreter set the Standard Input and
Standard Output as the default input and output devices, respectively.
Programmers can control the current device (unfortunately only as a unified
device, per the standard) by issuing a open
, use
, or close
command.
For the explanation below, we will assume the user is just using the
default IO device, referred to in MUMPy as STANDARD
(which can always be
accessed by the system variable $PRINCIPAL
).
MUMPS Input is done largely through the read
command, which accepts a
list of MUMPS string literals or variable names. For each variable name in its
argument list, the read
command will read in from the current device
until the user terminates input using the Return key. The value of the
user's input will be stored in the variable given.
MUMPS Output is performed using the write
command. The write
command
accepts one or more arguments, all of which are valid MUMPS expressions
or local or global variable names. The write
command will output the
evaluated expressions or stored values to the current output device in
strict left-to-right order.
The read
and write
commands offer a few convenience symbols for certain
outputs. These symbols can be combined and interspersed between other
arguments. The symbols are:
- Newline,
!
- Page clear,
#
(only for ANSI compliant terminals) - Column offset,
?N
whereN
is an integral value
There are also a few other control sequences that programmers can use to
control the input and output from their devices. The write
command permits
integral values (and expressions which are evaluated as numeric) to be
prefixed with a *
to output that character. Thus, write *45
is functionally
equivalent to write $C(45)
.
mumpy > write !!
mumpy > write ?10,"Christopher",?15,"Rink"
ChristopherRink
mumpy > write *45
-
mumpy > write $CHAR(45)
-
Reading permits input size restrictions (which might help preventing a
sort of DoS type attack) and timeout values for read operations. To indicate
a maximum read length, programmers can indicate the number of bytes after
their input variable with a #
character. Timeouts are specified after
a :
at the end of the read argument. If no data is read by the timeout
period, the value of $T
in the current stack frame is set to 0
. Another
convenience method for programmers is that they can specify reading just
one character of input by prefixing their read argument with a *
.
Here are some examples:
mumpy > read input ; Will read until newline
mumpy > read input#10 ; Will read exactly 10 bytes
mumpy > read input#10:10 ; Will read 10 bytes or fail after 10 seconds
mumpy > read *input ; Will read exactly 1 byte
M programmers can interact with devices in their environment using just
a few of the basic built-in commands. Devices may be file-like objects or
network sockets. To open a new device, you can issue an open
command
for the named device. Once a device is opened, you can begin using it
with the use
command. After you have issued a use
, subsequent read
and write
commands will use the newly selected device for input
and output.
Programmers can return to the standard input/output stream
by issuing a use $PRINCIPAL
or use $P
command. The $PRINCIPAL
intrinsic always stores the name of the initial device for the current
process. Likewise, the intrinsic $IO
stores the name of the current
device. Thus, it should always be the case that $IO
equals $P
when
the process starts. Note that if you close
the current device before
switching to another device, MUMPy will automatically switch your
current device back to the $PRINCIPAL
device.
An example of using a file device is given below:
mumpy > set file="names.txt"
mumpy > open file
mumpy > use file
mumpy > write "Amy"_$C(10)
mumpy > write "Chris"_$C(10)
mumpy > write "Jules"_$C(10)
mumpy > close file
This should produce a file looking like this:
Amy
Chris
Jules
For network socket devices, programmers need to specify some parameters
for their device. To define a server socket, the listen
device parameter
should tell MUMPy which hostname/port to respond to connections on. Likewise,
client sockets should use connect
to indicate which hostname/port connect
to. Note that for network sockets, the device name is largely symbolic.
The device parameters define the actual interface for these devices. File
devices actually use the name to find the actual file object.
MUMPy allows programmers to indicate their preferred encoding for the
current device (any device, not just sockets). To specify an encoding,
simply use the encoding
device parameter. Note that the encoding you
specify must be a valid encoding name as recognized by Python's codecs
library.
Below is an example of opening a server socket and waiting for inputs. Note that we specify a generic port to listen on as well as a max input size and timeout value.
mumpy > set size=10,timeout=10
mumpy > set dev="HTTP-Listen"
mumpy > open dev:("listen"=":60002")
mumpy > use dev
mumpy > read input#size:timeout
Routines are briefly introduced in the Use section of this document. In M, routines are the modular code-units by which programmers organize their code. Inside of routines, programmers can include 1 or more lines of M commands which perform some action or computation. Code can be further organized in these routines by tags, which are simply line labels. Tag names start in the first character column of any given line (unlike commands which must start in the second character column of a line).
The routine filename (excluding extension) should always be the first tag in the routine. Subsequent tags may be in the same format as the routine tag defined above. Body tags (i.e. those tags which are not the routine tag) may also be integers. Tags starting with a numeric character must be entirely numeric, however.
Any tag in the routine may also have a list of argument names immediately
following which are enclosed in parentheses and separated by commas
(without any spaces). Tags are also permitted to have no arguments; in
this format, they may either choose to have parentheses or not. Callers
must match their call format to the format of the tag in the routine. Thus,
a tag without parentheses may not be called with parentheses and a tag
with parentheses must be called with parentheses. Note that in M, all
arguments are technically optional. The interpreter performs an implicit
NEW
on any arguments which are not explicitly passed in by the caller.
Thus, it is incumbent on the code within tags to accommodate null inputs
if they are expecting non-null inputs.
Programmers are not required to follow any strict organizational requirements
with their tags. One tag may freely flow into another or execution may
halt (using a HALT
command), return to a caller (using a QUIT
command), or
simply be redirected to another tag, line or routine (using a GOTO
command).
In practice, programmers typically format their routines into subroutines
(tags which do not return a value via a QUIT
) and extrinsic functions
(tags which do return a value). This allows M programmers to safely emulate
other programming languages with more rigid code structure.
;************************
;* Learn M example
;*
;* Users could copy this example into a file named
;* LEARNM.m and then invoke `mumpy.py -f LEARNM` to
;* see this routine in action.
;************************
LEARNM ;
new var,name,resp
;
; Ask the user their name
read "What is your name? ",name
;
; Welcome them to MUMPy
set var="Hello and welcome to MUMPy, "_name_"!"
write var
;
; Ask them a question
set resp=$$AskQuestion("How are you today?",10)
;
; Quit this subroutine
q
;
; Ask the user a question and return their response.
; Allow the caller to indicate the maximum number of characters.
AskQuestion(question,max) ;
new resp
;
; Set a default maximum number of characters if none was given
set:(+max<1) max=40
;
; Write the question first (read cannot write non-string literals)
write !,question," "
;
; Read their response (maximum of 'max' chars)
read resp#max
;
; Return that value to the user
quit resp
The following resources have been invaluable to me as I have been writing MUMPy:
- Annotated MUMPS Standards - the closest thing to the ANSI/ISO M standards that is available online.
- [GT.M Programmers Guide] (http://tinco.pair.com/bhaskar/gtm/doc/books/pg/UNIX_manual/index.html) - GT.M is a (mostly) ANSI M compliant, open source MUMPS distribution. Its documentation is superior to the Annotated Standard, but it does differ from the Standard in a few ways.
- PLY - Python Lex-Yacc is the Python library which provides Lexing and Parsing capabilities to MUMPy.
- blist Documentation - blist
is the Python library MUMPy use for local variables (specifically nested
sorteddict
instances). - [Thunks, Trampolines, and Continuation Passing] (http://jtauber.com/blog/2008/03/30/thunks,_trampolines_and_continuation_passing/) - is a great resource on usage of all three title elements in Python.
- Python Socket HOWTO - a great resource for understanding the basics of Python socket programming.
MUMPy is licensed under the 3-clause BSD license. See the LICENSE file included with the source code for more details.