ztane/python-Levenshtein

Integer overflow in lev_edit_distance()

Closed this issue · 4 comments

An integer overflow in lev_edit_distance() leads to a heap based buffer overflow.

row = (size_t*)malloc(len2*sizeof(size_t));

When len2 is greater than 1/4 of size_t max, the multiplication will overflow. This causes a smaller than expected allocation to occur. In a 32bit python interpreter with a len2 of 1073741825 (0x40000001) the call to malloc will end up allocating 4 bytes (0x40000001 * 0x4 = 0x100000004 which wraps a 32bit size_t to 0x4).

C:\Users\test>"c:\Program Files (x86)\Python38-32\python.exe"
Python 3.8.7 (tags/v3.8.7:6503f05, Dec 21 2020, 17:43:54) [MSC v.1928 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.

import Levenshtein
s = "A" * 1073741825
Levenshtein.ratio("BBBB", s)

C:\Users\test>

Additionally, throughout the code the return value of calls to PyString_GET_SIZE() are not checked for Py_INVALID_SIZE ((Py_ssize_t)-1). If an object is passed which forces size to be invalid, the resulting Py_ssize_t error code is cast to size_t. This will result in the string sizes operated on throughout all of the subsequent operations to effectively be the entire addressable memory space. exploitation of this one would likely be pretty tricky, but it would be easy to cause a crash if you can get an object with an invalid size into this function. I'm not entirely sure if this is possible with string or unicode objects, but it seems likely.

Here's an example:

len1 = PyString_GET_SIZE(arg1);

ztane commented

Reopening this since the PyString_GET_SIZE is not yet quite handled.

ztane commented

@cris-spring PyString_GET_SIZE reads the slotted size from the string structures, it should be safe and is not affected by for example a subclass that has overwritten __len__

ztane commented

... and even if did it would not be a security problem because it is "crashing CPython by writing code" for which there are easier alternatives that do not require any extensions.

Thanks for the quick response and fix. I think you are correct about the get size macro. I'm still investigating similar issues with the functions that have error checking, but haven't found a way to get the macros to return -1. Fixes LGTM. Thanks again!