format

Copyright(c) 2010-2012 Neil Johnson


SUMMARY

A stripped down integer-only implementation of the core of the printf family of
functions.


INTRODUCTION

In most standard C libraries the printf family of functions can be rather heavy
to use - they pull in quite a lot of additional library code, and they often 
require considerable additional effort to support on small or even medium-sized 
embedded projects.

The "format" library answers this need.  It provides a small, efficient core 
function which implements the majority of the printf conversions, requires 
little in the way of system support, and can be easily ported to work with a 
wide range of output devices.


SYNOPSIS

#include "format.h"
int format( void * (*cons) (void *a, const char *s , size_t n),
             void * arg, const char *fmt, va_list ap );


DESCRIPTION

The "format" function sends strings of one or more characters to the consumer
function "cons" under the control of the string pointed to by "fmt" that
specifies how the subsequent arguments in "ap" are converted for output.  If
there are insufficient arguments for the format, the behaviour is undefined. 
The "format" function returns when the end of the format string is encountered.

The format string "fmt" is composed of zero or more directives: ordinary
characters (not "%"), which are sent unchanged to the consumer function; and
conversion specifications, each of which results in fetching zero or more
subsequent arguments, converting them, if applicable, according to the
corresponding conversion specifier, and then sending the result to the consumer
function.


THE CONSUMER FUNCTION

void * cons( void *a, const char *s, size_t n )

The consumer function "cons" takes an opaque pointer "a", a pointer to an array 
of characters "s" and the number "n" of characters to consume from "s".  It 
returns another opaque pointer which may be equal or different to "a" which will 
be passed to the next call to "cons".  The consumer function returns NULL to
indicate an error condition, which will cease any further format processing
and cause the format function to terminate with the EXBADFORMAT error code.

The first opaque pointer passed to the first call to "cons" is supplied as the
argument "arg" to the call to "format" (see above).


CONVERSION SPECIFIERS
 
Each conversion specification is introduced by the character %.  After the %, 
the following appear in sequence: 

= Zero or more flags (in any order) that modify the meaning of the conversion
  specification. 

= An optional minimum field width. If the converted value has fewer characters
  than the field width, it is padded with spaces (by default) on the left (or 
  right, if the left adjustment flag, described later, has been given) to the 
  field width.  The field width takes the form of an asterisk * (described 
  later) or a decimal integer. 

= An optional precision that gives the minimum number of digits to appear for 
  the b, d, i, I, o, u, U, x, and X conversions, or the maximum number of bytes 
  to be written for s conversions. The precision takes the form of a period (.) 
  followed either by an asterisk * (described later) or by an optional decimal 
  integer; if only the period is specified, the precision is taken as zero.
  
= An optional number base modifier that specifies the numeric base to be used
  by the i, I, u and U conversions. The base takes the form of a colon (:)
  followed by either an asterisk * (described later) or by an optional decimal 
  integer; if only the colon is specified the base is taken as decimal.
  
= An optional grouping modifier that specifies how digits are to be grouped.  

= An optional length modifier that specifies the size of the argument.

= A conversion specifier character that specifies the type of conversion to be 
  applied. 

As noted above, a field width, precision, base, or any combination, may be 
indicated by an asterisk. In this case, an int argument supplies the field width
or precision.  The arguments specifying field width, or precision, or both, 
shall appear (in that order) before the argument (if any) to be converted. A 
negative field width argument is taken as a "-" flag followed by a positive 
field width. A negative precision argument is taken as if the precision were 
omitted. 

The flag characters and their meanings are: 

-          The result of the conversion is left-justified within the field.  It
           is right-justified if this flag is not specified.

^          The result of the conversion is centre-justified within the field.
           It is right-justified if this flag is not specified. When there is
           an odd number of padding spaces the result of the conversion is
           biased to the right. It is biased to the left if the - flag is also
           specified.

+          The result of a signed conversion always begins with a plus or minus 
           sign. It begins with a sign only when a negative value is converted
           if this flag is not specified.

space      If the first character of a signed conversion is not a sign, or if a 
           signed conversion results in no characters, a space is prefixed to
           the result. If the space and + flags both appear, the space flag is 
           ignored.

#          The result is converted to an ''alternative form''. For o conversion, 
           it increases the precision, if and only if necessary, to force the 
           first digit of the result to be a zero (if the value and precision
           are both 0, a single 0 is printed). For x (or X or b) conversion, a  
           nonzero result has "0x" (or "0X" or "0b") prefixed to it. For other 
           conversions, the flag is ignored.
           
!          Modifies the behaviour of the # flag. For b, x and X conversions the
           result is always prefixed, even when zero.  For x and X conversions 
           the prefix is always "0x". If the # flag does not appear, or the 
           conversion is not b, x or X, the flag is ignored.

0          For b, d, i, I, o, u, U, x, and X conversions, leading zeros 
           (following any indication of sign or base) are used to pad to the 
           field width rather than performing space padding. If the 0 and - 
           flags both appear, the 0 flag is ignored.  For b, d, i, I, o, u, U, 
           x, and X conversions, if a precision is specified, the 0 flag is 
           ignored. For other conversions, the flag is ignored. 

The length modifiers and their meanings are: 

h          Specifies that a following b, d, i, I, o, u, U, x, or X conversion 
           specifier applies to a short int or unsigned short int argument (the 
           argument will have been promoted according to the integer promotions, 
           but its value shall be converted to short int or unsigned short int 
           before consuming); or that a following n conversion specifier applies
           to a pointer to a short int argument. 

l(ell)     Specifies that a following b, d, i, I, o, u, U, x, or X conversion 
           specifier applies to a long int or unsigned long int argument; or 
           that a following n conversion specifier applies to a pointer to a 
           long int argument.

If a length modifier appears with any conversion specifier other than as 
specified above, the length modifier is ignored. 

The conversion specifiers and their meanings are: 

d,i,I        The int argument is converted to signed decimal (d) or signed 
             number of the specified base (i or I) in the style [-]dddd. The 
             base specifies the number base. For bases greater than decimal the
             letters 'A' to 'Z' are used for I conversions, and 'a' to 'z' for 
             i conversions. The precision specifies the minimum number of digits
             to appear; if the value being converted can be represented in fewer
             digits, it is expanded with leading zeros. The default precision is
             1. The result of converting a zero value with a precision of zero
             is no characters. 
             
b,o,u,U,x,X  The unsigned int argument is converted to unsigned binary (b), 
             unsigned octal (o), unsigned number of specified base (u or U), or 
             unsigned hexadecimal notation (x or X) in the style dddd. For 
             bases greater than decimal the letters 'A' to 'Z' are used for X 
             and U conversions, and 'a' to 'z' for x and u conversions. The 
             precision specifies the minimum number of digits to appear; if the 
             value being converted can be represented in fewer digits, it is 
             expanded with leading zeros.  The default precision is 1. The 
             result of converting a zero value with a precision of zero is no
             characters.

c            The int argument is converted to an unsigned char, and the
             resulting character is written.

C            The character immediately following the conversion specifier is 
             written.  The precision specifies how many times the character is
             written.  The default and minimum precision is 1.

s            The argument is a pointer to the initial element of an array of 
             character type. Characters from the array are written up to (but 
             not including) the terminating null character. If the precision is 
             specified, no more than that many bytes are written. If the 
             precision is not specified or is greater than the size of the 
             array, the array must contain a null character.  A NULL argument is
             treated as pointer to the string "(null)".

p            The argument is a pointer to void. The value of the pointer is 
             converted to a sequence of printing characters using the conversion
             specification %#!N.NX, where N is determined by the size of pointer 
             to int on the target machine. 

n            The argument is a pointer to signed integer into which is written
             the number of characters passed to the consumer function so far by 
             this call to "format".  No argument is converted, but one is 
             consumed. Any flags, a field width, or a precision will be ignored.
             A NULL argument is silently ignored.

%            A "%" character is written. No argument is converted. The complete 
             conversion specification is %%. 
           
"            The argument is treated as a continuation of the format
             specification.  Any flags, width, precision or length will be 
             ignored.

If a conversion specification is invalid, "format" returns an error code. If any 
argument is not the correct type for the corresponding conversion specification, 
bad mojo happens. 

In no case does a nonexistent or small field width cause truncation of a field;
if the result of a conversion is wider than the field width, the field is 
expanded to contain the conversion result. 


RETURN VALUE

The "format" function returns the number of characters sent to the consumer
function, or the negative value EXBADFORMAT if an output or encoding error
occurred. 


LIMITS

The maximum width and precision are 500.  It is an error if values larger than
this are specified.

The largest number base is 36.  The smallest is 2.  A base of 0 (the default) is
treated as decimal (base 10).  It is an error to specify a base of 1 or greater
than 36.


EXAMPLES

The first example implements the same behaviour as the standard C library
function printf.  First, the consumer function:

void * outfunc( void * op, const char * buf, size_t n )
{
    while ( n-- )
        putchar( *buf++ );

    return (void *)( !NULL );
}

In this case the opaque pointer is not used, and a non-NULL value is returned.
Second, the implementation of the printf function:

int printf ( const char *fmt, ... )
{
    va_list arg;
    int done;
    
    va_start ( arg, fmt );
    done = format( outfunc, NULL, fmt, arg );
    va_end ( arg );
    
    return done;
}

Because the opaque pointer is not used, and the consumer function ignores it,
a NULL is passed to "format".

The second example illustrates how the opaque pointer is used to implement the
standard C library function sprintf.  In this example the consumer function
returns the address of the next location to receive any following characters:

void * bufwrite( void * memptr, const char * buf, size_t n )
{
    return ( memcpy( memptr, buf, n ) + n );
}

The implementation of sprintf is shown below, with an additional step to
append a null character to the end of the string written into buf:

int sprintf( char *buf, const char *fmt, ... )
{
    va_list arg;
    int done;
    
    va_start ( arg, fmt );
    done = format( bufwrite, buf, fmt, arg );
    if ( 0 <= done )
	    buf[done] = '\0';
    va_end ( arg );
    
    return done;
}

The final example illustrates the use of the opaque pointer to support an
LCD display with (x,y) positioning.  In this example a struct data type
describes where the consumer function is to place its output:

struct coord {
	short x, y;
};

Also assume the LCD is 80 characters wide, and follows the usual convention of
top-left being (0,0).

The consumer function uses the coord to set the position for each character
sent to the LCD calling a driver function lcd_putc:

void * lcd_putat( void * ap, const char *s, size_t n )
{
	struct coord *pc = (struct coord *)ap;
	
	while ( n-- )
	{
		lcd_putc( pc->x++, pc->y, *s++ );
		if ( pc->x >= 80 )
		{
			pc->x = 0;
			pc->y++;
		}
	}
	
	return (void *)pc;
}

The implementation of the lcd_printf function itself:

int lcd_printf( struct coord loc, const char *fmt, ... )
{
    va_list arg;
    int done;
    
    va_start ( arg, fmt );
    done = format( lcd_putat, &loc, fmt, arg );
    va_end ( arg );
    
    return done;
}

And an example call to this function might look like this:

    struct coord loc;
    int temperature;
    int status;
    
    temperature = 32;
    loc.x = 5;
    loc.y = 2;
    status = lcd_printf( loc, "Boiler temp = %+d Celsius", temperature );
    if ( status < 0 )
    {
    	/* error handler */
    }

--