Qfplib-M0-tiny for RT-Thread

A free ARM Cortex-M0 floating-point library in 1 kbyte

Introduction

Qfplib-M0-tiny is a library of IEEE 754 single-precision floating-point arithmetic routines for microcontrollers based on the ARM Cortex-M0 core (ARMv6-M architecture). It should also run on Cortex-M3 and Cortex-M4 microcontrollers and will give reasonable performance, but it is not optimised for these devices.

Many Cortex-M0 microcontrollers have very little program memory available, and so the primary design goal was to minimise the code size of the library without sacrificing too much in speed or in usefulness. To that end it provides correctly rounded (to nearest, even-on-tie) addition, subtraction, multiplication and division operations, and sine, cosine, tangent, arctangent, logarithm, exponential and square root functions that give a high degree of accuracy. There are also conversion functions between floating-point values and signed or unsigned integer or fixed-point values. The library fits in 1 kbyte of program memory.

If you can afford the luxury of an additional 200 bytes in code size fast divide and square root functions (that do not guarantee correctly rounded results) are also available.

How to Obtain

 RT-Thread online packages  --->
    system packages  --->
        acceleration: Assembly language or algorithmic acceleration packages  --->
            [*] Qfplib-M0-tiny: a free ARM Cortex-M0 floating-point library in 1 kbyte

Licence

Qfplib-M0-tiny is open source, licensed under version 2 of the GNU GPL. Use at your own risk. If you wish to enquire about alternative licensing please use the e-mail address on the home page.

Builds

Qfplib-M0-tiny can be built in different ways depending on which functions you need. The functions included are controlled by the symbols include_faster, include_conversions and include_scientific in the source code.

The build always includes the four basic arithmetic operations qfp_fadd, qfp_fsub, qfp_fmul and qfp_fdiv, plus the comparison function qfp_fcmp.

If the symbol include_conversions is set to 1 then conversion routines between floating-point values and integers or fixed-point values are included. These are qfp_float2int, qfp_float2fix, qfp_int2float, qfp_fix2float, qfp_float2uint, qfp_float2ufix, qfp_uint2float and qfp_ufix2float.

If the symbol include_scientific is set to 1 (which implies setting include_conversions to 1), then the following functions are included: qfp_fcos, qfp_fsin, qfp_ftan, qfp_fatan2, qfp_fexp, qfp_fln and qfp_fsqrt.

If the symbol include_faster is set to 1 then a faster but less accurate floating-point division routine qfp_fdiv_fast is also included, as is a fast square root routine qfp_fsqrt_fast.

Code size

Variant	Directives	Functions provided	Code size
Basic	`.equ include_faster,0` `.equ include_conversions,0` `.equ include_scientific,0`	`qfp_fadd` `qfp_fsub` `qfp_fmul` `qfp_fdiv` `qfp_fcmp`	0x17e=382 bytes
Basic plus fast division and square root	`.equ include_faster,1` `.equ include_conversions,0` `.equ include_scientific,0`	`qfp_fadd` `qfp_fsub` `qfp_fmul` `qfp_fdiv` `qfp_fdiv_fast` `qfp_fcmp` `qfp_fsqrt_fast`	0x244=580 bytes
Basic plus conversions	`.equ include_faster,0` `.equ include_conversions,1` `.equ include_scientific,0`	`qfp_fadd` `qfp_fsub` `qfp_fmul` `qfp_fdiv` `qfp_fcmp` `qfp_float2int` `qfp_float2fix` `qfp_int2float` `qfp_fix2float` `qfp_float2uint` `qfp_float2ufix` `qfp_uint2float` `qfp_ufix2float`	0x1e2=482 bytes
All functions	`.equ include_faster,0` `.equ include_conversions,1` `.equ include_scientific,1`	`qfp_fadd` `qfp_fsub` `qfp_fmul` `qfp_fdiv` `qfp_fcmp` `qfp_float2int` `qfp_float2fix` `qfp_int2float` `qfp_fix2float` `qfp_float2uint` `qfp_float2ufix` `qfp_uint2float` `qfp_ufix2float` `qfp_fcos` `qfp_fsin` `qfp_ftan` `qfp_fatan2` `qfp_fexp` `qfp_fln` `qfp_fsqrt`	0x3f8=1016 bytes
De luxe	`.equ include_faster,1` `.equ include_conversions,1` `.equ include_scientific,1`	`qfp_fadd` `qfp_fsub` `qfp_fmul` `qfp_fdiv` `qfp_fdiv_fast` `qfp_fcmp` `qfp_float2int` `qfp_float2fix` `qfp_int2float` `qfp_fix2float` `qfp_float2uint` `qfp_float2ufix` `qfp_uint2float` `qfp_ufix2float` `qfp_fcos` `qfp_fsin` `qfp_ftan` `qfp_fatan2` `qfp_fexp` `qfp_fln` `qfp_fsqrt` `qfp_fsqrt_fast`	0x4c0=1216 bytes

Qfplib-M0-tiny does not use any static storage. Stack use is parsimonious and statically analysable; recursion is not used.

Code size comparison against other embedded libraries

The standard floating-point library routines that come with the GCC cross-compiler for the Cortex-M0 core occupy about 2700 bytes for the four basic arithmetic functions alone; a trivial program that does nothing but call cosf compiles to 7.5 kbyte of code.

Texas Instruments has released information giving the code size for a number of simple benchmarks compiled for a range of microcontrollers in Table A-4 of document SLAA205C. It appears that the sizes given do not include start-up code (look for example at the compiled size of the ‘8-bit 2-dim matrix.c’ benchmark, which includes a 64 byte constant array).

The ‘floating-point math.c’ benchmark exercises floating-point addition, multiplication and division with very little overhead, and this allows us to make a meaningful comparison with the ‘Basic’ version of Qfplib-M0-tiny.

GCC compiles the body of this benchmark to a rather profligate 54 bytes and the total code size when linked with the ‘Basic’ version of Qfplib-M0-tiny is 382+54=436 bytes.

On page 26 of a presentation on LPC1100 series microcontrollers NXP claims that using the ARM ‘microlib’ library the same benchmark compiles to approximately 620 bytes. So even though microlib makes no attempt at IEEE 754 compliance, it is nevertheless about 50% larger than Qfplib-M0-tiny.

Information from the above sources is summarised in the following table.

Processor/library	Benchmark code size in bytes
Texas Instruments MSP430F5438	1102
Microchip dsPIC	2020
Microchip PIC24	2020
Renesas H8/300H	1104
Maxim MAXQ20	1172
Freescale HCS12	2082
Atmel ATxmega64A1	1080
Generic ARM7TDMI (Thumb)	1832
Intel MCS-51	2190
Microchip PIC18F242	1400
Atmel ATmega8	1088
NXP LPC1100 series ARM Cortex-M0 (microlib)	620
NXP LPC1100 series ARM Cortex-M0 (Qfplib-M0-tiny)	436

The cross-platform fixed-point arithmetic library libfixmath can run on the Cortex-M0 core. According to this page its implementation of the atan2 function is about four times larger than the whole of Qfplib-M0-tiny.

ARM provides a range of floating-point arithmetic routines as part of its CMSIS library. Unfortunately, at least based on an inspection of the part that has been released under a non-proprietary licence (see here), the implementations are poor and do not appear to have been tested thoroughly.

ARM's floating-point cosine routine includes a table of constants, not shared with any other functions, that is already larger than the whole of Qfplib-M0-tiny. The routine produces results about half as accurate as those of Qfplib-M0-tiny.

Speed

The following table compares cycle counts for Qfplib-M0-tiny against other libraries. Qfplib-M0-tiny and GCC library results are average values for non-exceptional arguments to the functions, include calling overhead, and are approximate. They were measured using an LPC11U68 microcontroller with single-cycle flash memory. Results for the Micro Digital ‘GoFast’ library—presumably optimised for speed rather than size, judging by its name—are inferred from the timings given on this page for an ARM7TDMI-based processor. The comparison here may not be not strictly fair to Qfplib-M0-tiny as it is not clear from their description whether Micro Digital’s library exploits features available on that processor but not on the Cortex-M0: for example, ARM mode is considerably faster and more flexible than Thumb mode, and the long multiply instructions can be used to advantage in several of the routines. Micro Digital do not appear to provide public information on the code size of their library. The implementation of the basic functions does not appear to be IEEE 754 compliant with regard to rounding.

Function	Qfplib-M0-tiny cycles	GCC library cycles	‘GoFast’ library cycles
`qfp_fadd`	150	102	182
`qfp_fsub`	151	108	181
`qfp_fmul`	165	166	144
`qfp_fdiv`	323	475	799
`qfp_fdiv_fast`	187	-	-
`qfp_fcmp`	27	-	103
`qfp_fcos`	579	3350	393
`qfp_fsin`	567	3300	394
`qfp_ftan`	748	6140	1090
`qfp_fatan2`	703	4930	2041
`qfp_fexp`	536	1930	372
`qfp_fln`	808	3960	1321
`qfp_fsqrt`	717	460	1590
`qfp_fsqrt_fast`	161	-	-

The ARM CMSIS implementations of the scientific functions, despite their name ‘FastMath’, appear to be many times slower than Qfplib-M0-tiny. For example, the average execution time for ARM's cosine function (compiled using GCC) is about 3880 cycles, virtually independent of the optimisation flags used.

The libfixmath implementations of the basic arithmetic functions are much faster than the floating-point implementations in Qfplib-M0-tiny; the implementation of square root is of comparable speed; and the scientific functions appear to be much slower.

Limitations and deviations from the IEEE 754 standard

Except as noted below, on input and output, NaNs are converted to infinities, denormals are flushed to zero, and negative zero is converted to positive zero. The result of the square root function is not always correctly rounded according to IEEE 754; see the next section for more on function accuracy.

Function ranges and accuracy

Subject to the limitations and deviations mentioned above, the functions qfp_fadd, qfp_fsub, qfp_fmul and qfp_fdiv all produce correctly rounded (to nearest, even-on-tie) results. This has been verified on real hardware against the default library supplied with the GCC cross-compiler using the Berkeley TestFloat suite, plus a further billion or so test cases, both random and contrived.

In the following table, ‘ulp’ means ‘unit in last place’. Where a relative accuracy is quoted (‘(R)’), this means the error in units of the least significant bit of the mantissa of the result. Where an absolute accuracy is quoted (‘(A)’), it means the error in units of 2–24.

Function	Valid argument range	Test	Mean signed (systematic) error	Mean unsigned error	RMS error	Remarks
`qfp_fdiv_fast`	Any	10000000 random pairs x, y where 1≤x<2; 1≤y<2	0.001244 ulp (R)	0.2518 ulp (R)	0.2917 ulp (R)	Relative accuracy is independent of the exponents of the arguments; result is exact when divisor is a power of 2
`qfp_fcos`	–128<x<+128	All values from –π to +π in steps of 2–22	0.004518 ulp (A)	0.3905 ulp (A)	0.5054 ulp (A)	Relative accuracy is poor where the result is near zero; argument is clipped to valid range; `qfp_fcos(0)==1`
All values from –128 to +128 in steps of 2–17	0.007372 ulp (A)	0.6243 ulp (A)	0.7517 ulp (A)
`qfp_fsin`	–128<x<+128	All values from –π to +π in steps of 2–22	0.2654 ulp (A)	0.4541 ulp (A)	0.5713 ulp (A)	Relative accuracy is poor where the result is near zero; argument is clipped to valid range; `qfp_fsin(0)==0`
All values from –128 to +128 in steps of 2–17	0.2647 ulp (A)	0.6531 ulp (A)	0.7935 ulp (A)
`qfp_ftan`	–128<x<+128	All values from –1 to +1 in steps of 2–24	0.1730 ulp (A)	0.4314 ulp (A)	0.6083 ulp (A)	Relative accuracy is poor where the result is near zero; absolute accuracy is poor where the result approaches infinity; `qfp_ftan` is calculated by dividing the results of `qfp_fsin` and `qfp_fcos` using `qfp_fdiv_fast` (if available; otherwise `qfp_fdiv` is used); `qfp_ftan(0)==0`
All values from –1.5 to +1.5 in steps of 2–23	–0.3961 ulp (A)	1.543 ulp (A)	4.259 ulp (A)
`qfp_fatan2`	Any	10000000 random pairs x, y where –2≤x<2; –2≤y<2, not both less than 0.25 in absolute value	0.1705 ulp (A)	0.6925 ulp (A)	0.8570 ulp (A)	Result is independent of any overall offset added to the exponents of both arguments; `qfp_fatan2(0,1)==0`
`qfp_fexp`	Any	All values from –87 to +88 in steps of 2–17	–0.1207 ulp (R)	0.2936 ulp (R)	0.3571 ulp (R)	Returns zero for x≤–87.33655 and +infinity for x≥88.72284; `qfp_fexp(0)==1`
`qfp_fln`	x>0	All values from 2–4 to 24 in steps of 2–20	0.03885 ulp (A)	0.8292 ulp (A)	1.053 ulp (A)	Returns –infinity for x≤0
`qfp_fsqrt`	x≥0	All representable values from 1 to 4	0.3376 ulp (A)	0.5889 ulp (A)	0.7152 ulp (A)	Relative accuracy is independent of even offsets to exponent; returns –infinity for x<0; `qfp_fsqrt(0)==0; qfp_fsqrt_fast(0)==0; qfp_fsqrt(1)==1; qfp_fsqrt_fast(1)==1`
`qfp_fsqrt_fast`	–0.04438 ulp (A)	0.6969 ulp (A)	0.8676 ulp (A)

qfp_fcmp returns zero if its arguments are equal (negative zero is equal to positive zero) or plus or minus one if its first argument is respectively greater than or less than its second. Input denormals are not flushed to zero; and NaNs are compared respecting their signs and treating them as values beyond ±infinity.

qfp_float2int(x) is equivalent to qfp_float2fix(x,0).

qfp_float2fix(x,y) converts a floating-point value x to a signed fixed point value, with y bits after the binary point. The result is rounded towards –infinity. y can be from –256 to +256. The result is clamped to the available (signed) output range.

qfp_int2float(x) is equivalent to qfp_fix2float(x,0).

qfp_fix2float(x,y) converts a signed fixed point value x with y bits after the binary point to a floating-point value, correctly rounded (to nearest, even-on-tie). y can be from –256 to +256. If the result is outside the representable range, ±infinity is returned as appropriate.

qfp_float2uint, qfp_float2ufix, qfp_uint2float and qfp_ufix2float are the same as qfp_float2int, qfp_float2fix, qfp_int2float and qfp_fix2float, but work with unsigned fixed-point and integer values.

Qfpio: string conversion functions

Qfpio, part of the Qfplib-M0-tiny download from release 20151029, includes two functions for converting between floating-point values and ASCII strings. The functions are qfp_float2str(float f,char*s,unsigned int fmt), which converts a float to a string with flexible control of formatting, and qfp_str2float(float*f,char*p,char**endptr), which performs the reverse conversion. Again, the emphasis is on compactness, without compromising too much in speed or accuracy: the total code size for the two functions is just over 800 bytes. Qfpio does not call any of the other functions in Qfplib-M0-tiny and so can be compiled independently.

Files

qfplib.s, the source code to qfplib. The GNU assembler syntax is used.
qfplib.h, a C header file giving prototypes for the qfplib functions.
qfpio.s, the source code to qfpio, routines for converting between strings and floating-point values.
qfpio.h, a C header file giving prototypes for the qfpio functions.

Visit http://www.quinapalus.com/qfplib.html for more information.

mysterywolf/Qfplib-M0-tiny