lithops-cloud/lithops

AWS Lambda invoker's performance depends on the Python interpreter

gfinol opened this issue · 10 comments

gfinol commented

I've noticed an issue with the performance of invocation of AWS Lambda functions. Depending on the python interpreter used, the performance of the invocation of cloud functions changes.

For example, when using the Python 3.10 interpreter of VM in AWS EC2 with Ubuntu 22.04, some AWS Lambda functions start is delayed between 5 and 10 seconds. As can be seen in this plot:

python31012-system1702566715_timeline

But using the same Python version (3.10.12) from Conda in the same VM, same OS and same AWS account I obtained a much better performance:
python31012-conda1702567642_timeline

Despite the performance improvement when using Conda, there are still almost 50% of functions that take 1 second longer to start, even when in a warmed-up state (see the two last map stages from the previous plot). This behavior is the same for Python 3.8, 3.9, 3.10 and 3.11.

Click to see: Python 3.8 plot (using conda)

python38-conda1702566938_timeline

Python 3.9 plot (using conda)

python39-conda1702567023_timeline

Python 3.10 plot (using conda)

python31013-conda1702567068_timeline

Python 3.11 plot (using conda)

python311-conda1702567151_timeline

But with Python 3.7 the performance is what one would expect to be (almost perfect):
python37-conda1702566853_timeline

All this previous plots have been generated doing 3 maps of 100 functions that sleep for 5 seconds. This has been executed from a t2.large VM with Ubuntu 22.04 in us-east-1, with all the Lithops default configurations except for the invoke_pool_threads that was set to 128. I have also used the same VM with Amazon Linux 2023 OS and the results are similar to the previous ones using the Conda interpreter (I could upload the plots if requested). I've used the current master branch of Lithops to do this test, but the issue can be reproduced using versions 3.0.0, 3.0.1, 2.9, and also 2.7.1.

Here is the code used:

import time
import lithops

def count_cold_starts(futures):
    cold = 0
    warm = 0
    for future in futures:
        stats = future.stats
        if stats['worker_cold_start']:
            cold += 1
        else:
            warm += 1
    return cold, warm

futures = []
fexec = lithops.FunctionExecutor()
for _ in range(3):
    num_fun = 100

    def my_sleep(x):
        time.sleep(x)
        return num_fun

    f = fexec.map(my_sleep, [5 for _ in range(num_fun)])
    fexec.get_result()
    futures.append(f)

    cold, warm = count_cold_starts(f)

    print(f"cold: {cold}, warm: {warm}")

fexec.plot()

Hi @gfinol , just to make sure this is not an issue with lithops rather than its dependences, could you check the following?:

  • Use different versions using virtual environments (fresh environment)
  • Intall only lithops using pip install -U --no-cache-dir lithops
  • Then do pip freeze and send the results

Thanks

gfinol commented

Also, notice that "Boto3 and Botocore ended support for Python 3.7 on December, 13, 2023". So, the best performance is achieved with a Python version no longer supported.

Just to make sure, maybe you could create a 3.11 venv and do pip install -U --no-cache-dir -r coda_py37.txt so it has the same versions as the 3.7 venv, but it mostly seems that there is something regarding Python threads that Lithops or boto3/botocore/urllib3 use that changed from 3.8 onwards.

gfinol commented

@aitorarjona I tried to do what you suggested with a 3.11 env, but it failed due to some version incompatibilities between libraries versions and the python version.

But I managed to get it working with 3.10. The results look like the previous ones:

1702902863_timeline

(Note that the certifi requeriment in conda_py37.txt points to a file, that line was removed to install them in python 3.10)

I agree with you that this, at a first glance, looks like a problem with the thread pool used. Not sure how that could be confirmed...

I remember that some years ago I changed the invoke method of the lambda backend in order to improve the invocation performance. It was working well then (I think I did it with python3.6), but maybe that solution is not working properly now for newer versions of python (or boto3)

In the aws_lambda.py, can you try commenting the lines 630-653 and uncommenting lines 655-670? this way we will see how the boto3 lib perfoms invoking functions, and if this is the casue of the issue you are experiencing.

@JosepSampe, I've been doing the tests that you suggested. I've executed the tests twice, because the results are worse. Here are the resulting plots:

With the Python 3.10 from the OS in Ubuntu 22.04 from the official AMI in AWS EC2:

pythons-3 10-sys-1704709941_timeline

Using the interpreter from conda, Python 3.10:

conda-3 10-1704710366_timeline

And using Python 3.7 with conda:

conda-3 7-1704710116_timeline

In general, the performance is worse. For example, we can have a look to the invocations using python 3.7: In this recent plot, the invocations in the second and third map are delayed 1 o 1.5 seconds. But in the original plots, the invocations were almost perfect.

I leave here the plots for the other python versions with conda:

Python 3.8 conda

conda-3 8-1704710307_timeline

Python 3.9 conda

conda-3 9-1704710215_timeline

Python 3.11 conda

conda-3 11-1704710461_timeline

So, in summary, is this something related to Lithops? or is it more related to python? or AWS Lambda?

Python Interpreter

I'm currently using Python 3.11 interpreter of VM in AWS EC2 with Ubuntu 22.04.
I'm currently working on a modified runtime of aws_lambda.
Lithops originally serializes the code, dependencies and parameters, uploads it to S3. The function then downloads from S3 and deserializes. I did some experiments to avoid the steps through S3. My invoke just calls the function, passing the parameters as payload.
This is part of my aws_lambda.py:

self.lambda_client = self.aws_session.client(
    'lambda', region_name=self.region_name,
    config=botocore.client.Config(
        max_pool_connections=5000,
        read_timeout=900,
        connect_timeout=900,
        user_agent_extra=self.user_agent
    )
)
...
def invoke(self, runtime_name, runtime_memory, payload):
    response = self.lambda_client.invoke(
        FunctionName=function_name,
        Payload=json.dumps(payload, default=str)
    )
    return json.loads(response['Payload'].read().decode('utf-8'))

And this is how I use invoke:

def invocator(payload, number):
    start = time.time()
    result = self.compute_handler.invoke(payload)
    end = time.time()
    starttimes[number] = start
    endtimes[number] = end
    return result

def general_executor(payloads):
    with ThreadPoolExecutor(max_workers=len(payloads)) as executor:
        results = list(executor.map(lambda p: invocator(*p), payloads_with_numbers))
    return results

With this code, that is different from the way lithops originally works, I get the same problem described in this issue. This is why I think that is not related to Lithops.

I have a Containerized Runtime with many dependencies. For this experiment, every Lambda will just returns a String "Hello World".

return {
    'statusCode': 200,
    'body': "Hello World"
}

As you can see in the invocator code, I measure the startimes and the endtimes of every invocation. I invoked 100 functions in cold and warm state. With those times I can build a plot.

image
image

As you can see, there is barely any difference between both cold and warm. This is because of this added delay described in this thread.

Conda Python Interpreter

If I install miniconda and create an env Python 3.11 in my AWS EC2 with Ubuntu 22.04. I execute the same code and get:

image
image

The behavior using the conda environment looks more like what Lithops would do. Warm functions are take less than 1 second and cold takes half of the time it used to take.

I don't know why Conda solved the problem...