/fastnumbers

Super-fast and clean conversions to numbers for Python.

Primary LanguageC++MIT LicenseMIT

fastnumbers

Super-fast and clean conversions to numbers.

fastnumbers is a module with the following three objectives (in order of decreasing importance as to why the module was created):

  1. Provide a set of convenience functions that wrap calls to int and float and provides easy, concise, powerful, fast and flexible error handling.
  2. Provide a set of functions that can be used to rapidly identify if an input could be converted to int or float.
  3. Provide drop-in replacements for the Python built-in int and float that are on par or faster with the Python equivalents (see the Timing section for details). These functions should behave identically to the Python built-ins except for a few specific corner-cases as mentioned in the API documentation for those functions.
    • PLEASE read the quick start for these functions to fully understand the caveats before using them.

What kind of speedups can you expect? Here are some highlights, but please see the Timing section for the raw data if you want details.

  • Up to 2x faster conversion of strings to integers than the built-in int() function
  • Up to 5x faster conversion of strings to floats than the built-in float() function (possibly greater for very long strings)
  • Up to 10x faster handling of errors during conversion than using user-side error handling
  • On top of the above, operations to convert a list of strings (with the map option or try_array function) is 2x faster than the equivalent list comprehension.

NOTICE: As of fastnumbers version 4.0.0, only Python >= 3.7 is supported.

NOTICE: As of fastnumbers version 4.0.0, the functions fast_real, fast_float, fast_int, fast_forceint, isreal, isfloat, isint, and isintlike have been deprecated and are replaced with try_real, try_float, try_int, try_forceint, check_real, check_float, check_int, and check_intlike, respectively. These new functions have more flexible APIs and have names that better reflect the intent of the functions. The old functions can still be used (they will never be removed from fastnumbers), but the new ones should be preferred for new development.

NOTICE: As of fastnumbers version 4.0.0, query_type now sets allow_underscores to False by default instead of True.

Quick Start

There are three broad categories of functions exposed by fastnumbers. The below quick start will demonstrate each of these categories. The quick start is "by example", and will show a sample interactive session using the fastnumbers API.

Error-Handling Functions

try_float will be used to demonstrate the functionality of the try_* functions.

>>> from fastnumbers import RAISE, try_float
>>> # Convert string to a float
>>> try_float('56.07')
56.07
>>> # Integers are converted to floats
>>> try_float(54)
54.0
>>>
>>> # Unconvertable string returned as-is by default
>>> try_float('bad input')
'bad input'
>>> # Unconvertable strings can trigger a default value
>>> try_float('bad input', on_fail=0)
0
>>>
>>> # One can ask inf or nan to be substituted with another value
>>> try_float('nan')
nan
>>> try_float('nan', nan=0.0)
0.0
>>> try_float(float('nan'), nan=0.0)
0.0
>>> try_float('56.07', nan=0.0)
56.07
>>>
>>> # The default built-in float behavior can be triggered with
>>> # RAISE given to "on_fail".
>>> try_float('bad input', on_fail=RAISE) #doctest: +IGNORE_EXCEPTION_DETAIL
Traceback (most recent call last):
  ...
ValueError: invalid literal for float(): bad input
>>>
>>> # A function can be used to return an alternate value for invalid input
>>> try_float('bad input', on_fail=len)
9
>>> try_float(54, on_fail=len)
54.0
>>>
>>> # Single unicode characters can be converted.
>>> try_float('\u2164')  # Roman numeral 5 (V)
5.0
>>> try_float('\u2466')  # 7 enclosed in a circle
7.0

try_int behaves the same as try_float, but for integers.

>>> from fastnumbers import try_int
>>> try_int('1234')
1234
>>> try_int('\u2466')
7

try_real is like try_float or try_int depending on if there is any fractional component of thi return value.

>>> from fastnumbers import try_real
>>> try_real('56')
56
>>> try_real('56.0')
56
>>> try_real('56.0', coerce=False)
56.0
>>> try_real('56.07')
56.07
>>> try_real(56.07)
56.07
>>> try_real(56.0)
56
>>> try_real(56.0, coerce=False)
56.0

try_forceint always returns an integer.

>>> from fastnumbers import try_forceint
>>> try_forceint('56')
56
>>> try_forceint('56.0')
56
>>> try_forceint('56.07')
56
>>> try_forceint(56.07)
56

Fast operations on lists and other iterables

Each of the try_* functions have a map option causes the function to accept an iterable of items to convert and returns a list. Using try_float as an example, the following are all functionally equivalent.

>>> from fastnumbers import try_float
>>> iterable = ["5", "4.5", "34567.6", "32"]
>>> try_float(iterable, map=list) == list(map(try_float, iterable))
True
>>> try_float(iterable, map=list) == [try_float(x) for x in iterable]
True
>>> try_float(iterable, map=list) == list(try_float(iterable, map=True))
True

The difference is that the map option is 2x the speed of the list comprehension method, and 1.5x the speed of the map method. The reason is that it avoids Python function call overhead on each iteration. Note that True causes the function to return an iterator, and list causes it to return a list. In practice the performance of these are similar (see Timing for raw data).

If you need to store your output in a numpy array, you can use try_array to do this conversion directly. This function has some additional handling for overflow that is not present in the other fastnumbers functions that may come in handy when dealing with numpy arrays.

>>> from fastnumbers import try_array
>>> import numpy as np
>>> iterable = ["5", "4.5", "34567.6", "32"]
>>> np.array_equal(np.array(try_float(iterable, map=list), dtype=np.float64), try_array(iterable))
True

You will see about a 2x speedup of doing this in one step over converting to a list then converting that list to an array.

About the on_fail option

The on_fail option is a way for you to do anything in the event that the given input cannot be converted to a number. It can

  • return given object as-is if set to fastnumbers.INPUT (this is the default)
  • raise a ValueError if set to fastnumbers.RAISE
  • return a default value if given any non-callable object
  • call a function with the given object if given a single-argument callable

Below are a couple of ideas to get you thinking.

NOTE:: There is also an on_type_error option that behaves the same as on_fail except that a) it is triggered when the given object is of an invalid type and b) the default value is fastnumbers.RAISE, not fastnumbers.INPUT.

>>> from fastnumbers import INPUT, RAISE, try_float
>>> # You want to convert strings that can be converted to numbers, but
>>> # leave the rest as strings. Use fastnumbers.INPUT (the default)
>>> try_float('45.6')
45.6
>>> try_float('invalid input')
'invalid input'
>>> try_float('invalid input', on_fail=INPUT)
'invalid input'
>>>
>>>
>>>
>>> # You want to convert any invalid string to NaN
>>> try_float('45.6', on_fail=float('nan'))
45.6
>>> try_float('invalid input', on_fail=float('nan'))
nan
>>>
>>>
>>>
>>> # Simple callable case, send the input through some function to generate a number.
>>> try_float('invalid input', on_fail=lambda x: float(x.count('i')))  # count the 'i's
3.0
>>>
>>>
>>>
>>> # Suppose we know that our input could either be a number, or if not
>>> # then we know we just have to strip off parens to get to the number
>>> # e.g. the input could be '45' or '(45)'. Also, suppose that if it
>>> # still cannot be converted to a number we want to raise an exception.
>>> def strip_parens_and_try_again(x):
...     return try_float(x.strip('()'), on_fail=RAISE)
...
>>> try_float('45', on_fail=strip_parens_and_try_again)
45.0
>>> try_float('(45)', on_fail=strip_parens_and_try_again)
45.0
>>> try_float('invalid input', on_fail=strip_parens_and_try_again) #doctest: +IGNORE_EXCEPTION_DETAIL
Traceback (most recent call last):
  ...
ValueError: invalid literal for float(): invalid input
>>>
>>>
>>>
>>> # Suppose that whenever an invalid input is given, it needs to be
>>> # logged and then a default value is returned.
>>> def log_and_default(x, log_method=print, default=0.0):
...     log_method("The input {!r} is not valid!".format(x))
...     return default
...
>>> try_float('45', on_fail=log_and_default)
45.0
>>> try_float('invalid input', on_fail=log_and_default)
The input 'invalid input' is not valid!
0.0
>>> try_float('invalid input', on_fail=lambda x: log_and_default(x, default=float('nan')))
The input 'invalid input' is not valid!
nan

About the denoise option

The denoise option is available on the try_real and try_forceint options. To best understand its usage, consider the following native Python behavior:

>>> int(3.453e21)
3452999999999999737856
>>> int(float("3.453e21"))
3452999999999999737856
>>> # Most users would likely expect this result from decimal.Decimal
>>> import decimal
>>> int(decimal.Decimal("3.453e21"))
3453000000000000000000
>>> # But watch out, even decimal.Decimal doesn't help for float input
>>> import decimal
>>> int(decimal.Decimal(3.453e21))
3452999999999999737856

Because the conversion of a float to an int goes through the C double data type which has inherent limitations on accuracy (See this Stack Overflow question for examples) the resulting int result has "noise" digits that are not part of the original float representation.

For functions where this makes sense, fastnumbers provides the denoise option to give you the results that decimal.Decimal would give for strings containing floats.

>>> from fastnumbers import try_real
>>> try_real(3.453e21)
3452999999999999737856
>>> try_real("3.453e21")
3452999999999999737856
>>> try_real(3.453e21, denoise=True)
3453000000000000000000
>>> try_real("3.453e21", denoise=True)
3453000000000000000000

Two things to keep in mind:

  1. The denoise option adds additional overhead to the conversion calculation, so please consider the trade-offs between speed and accuracy when determining whether or not to use it. It is significantly faster than using decimal.Decimal, but much slower than not using it at all.
  2. For string input, denoise will return results identical to decimal.Decimal. For float input, denoise will return results that are accurate to about 15 digits (C double can only store 16 decimal digits, so this means that only the last possible digit may not be accurate).

Checking Functions

check_float will be used to demonstrate the functionality of the check_* functions. There is also the query_type function.

>>> from fastnumbers import check_float
>>> from fastnumbers import ALLOWED, DISALLOWED, NUMBER_ONLY, STRING_ONLY
>>> # Check that a string can be converted to a float
>>> check_float('56')
True
>>> check_float('56', strict=True)
False
>>> check_float('56.07')
True
>>> check_float('56.07 lb')
False
>>>
>>> # Check if a given number is a float
>>> check_float(56.07)
True
>>> check_float(56)
False
>>>
>>> # Specify if only strings or only numbers are allowed
>>> check_float(56.07, consider=STRING_ONLY)
False
>>> check_float('56.07', consider=NUMBER_ONLY)
False
>>>
>>> # Customize handling for nan or inf (see API for more details)
>>> check_float('nan')
False
>>> check_float('nan', nan=ALLOWED)
True
>>> check_float(float('nan'))
True
>>> check_float(float('nan'), nan=DISALLOWED)
False

check_int works the same as check_float, but for integers.

>>> from fastnumbers import check_int
>>> check_int('56')
True
>>> check_int(56)
True
>>> check_int('56.0')
False
>>> check_int(56.0)
False

check_real is very permissive - any float or integer is accepted.

>>> from fastnumbers import check_real
>>> check_real('56.0')
True
>>> check_real('56')
True
>>> check_real(56.0)
True
>>> check_real(56)
True

check_intlike checks if a number is "int-like", if it has no fractional component.

>>> from fastnumbers import check_intlike
>>> check_intlike('56.0')
True
>>> check_intlike('56.7')
False
>>> check_intlike(56.0)
True
>>> check_intlike(56.7)
False

The query_type function can be used if you need to determine if a value is one of many types, rather than whether or not it is one specific type.

>>> from fastnumbers import query_type
>>> query_type('56.0')
<class 'float'>
>>> query_type('56')
<class 'int'>
>>> query_type(56.0)
<class 'float'>
>>> query_type(56)
<class 'int'>
>>> query_type(56.0, coerce=True)
<class 'int'>
>>> query_type('56.0', allowed_types=(float, int))
<class 'float'>
>>> query_type('hey')
<class 'str'>
>>> query_type('hey', allowed_types=(float, int))  # returns None

Drop-in Replacement Functions

PLEASE do not take it for granted that these functions will provide you with a speedup - they may not. Every platform, compiler, and data-set is different, and you should perform a timing test on your system with your data to evaluate if you will see a benefit. As you can see from the data linked in the Timing section, the amount of speedup you will get is particularly data-dependent. In general you will see a performance boost for floats (and this boost increases as the size of the float increases), but for integers it is largely dependent on the length of the integer. You will likely not see a performance boost if the input are already numbers instead of strings.

NOTE: in the below examples, we use from fastnumbers import int instead of import fastnumbers. This is because calling fastnumbers.int() is a bit slower than just int() because Python has to first find fastnumbers in your namespace, then find int in the fastnumbers namespace, instead of just finding int in your namespace - this will slow down the function call and defeat the purpose of using fastnumbers. If you do not want to actually shadow the built-in int function, you can do from fastnumbers import int as fn_int or something like that.

>>> # Use is identical to the built-in functions
>>> from fastnumbers import float, int
>>> float('10')
10.0
>>> int('10')
10
>>> float('bad input') #doctest: +IGNORE_EXCEPTION_DETAIL
Traceback (most recent call last):
  ...
ValueError: invalid literal for float(): bad input

real is provided to give a float or int depending on the fractional component of the input.

>>> from fastnumbers import real
>>> real('56.0')
56
>>> real('56.7')
56.7
>>> real('56.0', coerce=False)
56.0

Timing

Just how much faster is fastnumbers than a pure python implementation? Please look https://github.com/SethMMorton/fastnumbers/tree/main/profiling.

High-Level Algorithm

For integers, CPython goes to great lengths to ensure that your string input is converted to a number correctly and losslessly (you can prove this to yourself by examining the source code for integer conversions). This extra effort is only needed for integers that cannot fit into a 64-bit integer data type - for those that can, a naive algorithm of < 10 lines of C code is sufficient and significantly faster. fastnumbers uses a heuristic to determine if the input can be safely converted with the much faster naive algorithm, and if so it does so, falling back on the CPython implementation for longer input strings. Most real-world numbers pass the heuristic and so you should generally see improved performance with fastnumbers for integers.

For floats, fastnumbers utilizes the ultra-fast fast_float::from_chars function to convert strings representing floats into a C double both quickly and safely - the conversion provides the same accuracy as the CPython float conversion function but instead of scaling linearly with length of the input string it seems to have roughly constant performance. By completely bypassing the CPython converter we get significant performance gains with no penalty, so you should always see improved performance with fastnumbers for floats.

Installation

Use pip!

$ pip install fastnumbers

How to Run Tests

Please note that fastnumbers is NOT set-up to support python setup.py test.

The recommended way to run tests is with tox. Suppose you want to run tests for Python 3.8 - you can run tests by simply executing the following:

$ tox run -e py38

tox will create virtual a virtual environment for your tests and install all the needed testing requirements for you.

If you want to run testing on all supported Python versions you can simply execute

$ tox run

You can change the how much "random" input your tests will try with

# Run fewer tests with "random" input - much faster
$ tox run -- --hypothesis-profile fast

# Run more tests with "random" input - takes much longer but is more thorough
$ tox run -- --hypothesis-profile thorough

If you want to run the performce analysis yourself, you can execute

# This assumes Python 3.9 - adjust for the version you want to profile
$ tox run -e py39-prof

If you do not wish to use tox, you can install the testing dependencies with the dev-requirements.txt file and then run the tests manually using pytest.

$ pip install -r dev/requirements.txt
$ pytest

Author

Seth M. Morton

History

Please visit the changelog on GitHub.