lithops-cloud/lithops

FileNotFoundError: func.pickle not found in customized_runtime for AWS Lambda

ZikBurns opened this issue · 10 comments

I tried the new code with the update in invoke.py in AWS Lambda for Python 3.11.

I used the Dockerfile template:

# Python 3.11
FROM python:3.11-slim-buster

RUN apt-get update \
    # Install aws-lambda-cpp build dependencies
    && apt-get install -y \
      g++ \
      make \
      cmake \
      unzip \
    # cleanup package lists, they are not used anymore in this image
    && rm -rf /var/lib/apt/lists/* \
    && apt-cache search linux-headers-generic

ARG FUNCTION_DIR="/function"

# Copy function code
RUN mkdir -p ${FUNCTION_DIR}

# Update pip
RUN pip install --upgrade --ignore-installed pip wheel six setuptools \
    && pip install --upgrade --no-cache-dir --ignore-installed \
        psutil \
        awslambdaric \
        boto3 \
        redis \
        httplib2 \
        requests \
        numpy \
        scipy \
        pandas \
        pika \
        kafka-python \
        cloudpickle \
        ps-mem \
        tblib \
        psutil

# Set working directory to function root directory
WORKDIR ${FUNCTION_DIR}

# Add Lithops
COPY lithops_lambda.zip ${FUNCTION_DIR}
RUN unzip lithops_lambda.zip \
    && rm lithops_lambda.zip \
    && mkdir handler \
    && touch handler/__init__.py \
    && mv entry_point.py handler/

# Put your dependencies here, using RUN pip install... or RUN apt install...

ENTRYPOINT [ "/usr/local/bin/python", "-m", "awslambdaric" ]
CMD [ "handler.entry_point.lambda_handler" ]

Then I do:
lithops runtime build -f Dockerfile -b aws_lambda helloworld
lithops runtime deploy helloworld --memory 128

If I do a simple call_async:

from lithops import FunctionExecutor

def hello(name):
    return 'Hello {}!'.format(name)

with FunctionExecutor(runtime='off_sample_311',runtime_memory=128) as fexec:
    fut = fexec.call_async(hello, 'World')
    print(fut.result())

Output:
Hello World!

Now, if I change the .lithops_config file:

lithops:
    backend: aws_lambda
    storage: aws_s3
    customized_runtime: True

If I re-execute the call_async, it gets stuck:

2024-02-16 13:56:02,721 [INFO] config.py:139 -- Lithops v3.1.2.dev0 - Python3.11
2024-02-16 13:56:02,810 [INFO] aws_s3.py:68 -- S3 client created - Region: eu-west-1
2024-02-16 13:56:03,861 [INFO] aws_lambda.py:106 -- AWS Lambda client created - Region: eu-west-1
2024-02-16 13:56:03,862 [INFO] invokers.py:107 -- ExecutorID fee179-0 | JobID A000 - Selected Runtime: helloworld - 128MB
2024-02-16 13:56:03,921 [INFO] invokers.py:489 -- Creating runtime: helloworld:ea93b8ff9667cee2d751713988987e93, memory: 128MB
2024-02-16 13:56:03,921 [INFO] aws_lambda.py:344 -- Building runtime helloworld:ea93b8ff9667cee2d751713988987e93 from /tmp/lithops-pepe/custom-runtime/ea93b8ff9667cee2d751713988987e93/Dockerfile
WARNING! Your password will be stored unencrypted in /home/pepe/.docker/config.json.
Configure a credential helper to remove this warning. See
https://docs.docker.com/engine/reference/commandline/login/#credentials-store

2024-02-16 13:56:08,748 [INFO] aws_lambda.py:457 -- Deploying runtime: helloworld:ea93b8ff9667cee2d751713988987e93 - Memory: 128 Timeout: 180
2024-02-16 13:56:36,889 [INFO] invokers.py:172 -- ExecutorID fee179-0 | JobID A000 - Starting function invocation: hello() - Total: 1 activations
2024-02-16 13:56:36,890 [INFO] invokers.py:208 -- ExecutorID fee179-0 | JobID A000 - View execution logs at /tmp/lithops-pepe/logs/fee179-0-A000.log
2024-02-16 13:56:36,918 [INFO] aws_s3.py:68 -- S3 client created - Region: eu-west-1

And this is the problem that I see in Cloudwatch:

LAMBDA_WARNING: Unhandled exception. The most likely cause is an issue in the function code. However, in rare cases, a Lambda runtime update can cause unexpected function behavior. For functions using managed runtimes, runtime updates can be triggered by a function change, or can be applied automatically. To determine if the runtime has been updated, check the runtime version in the INIT_START log entry. If this error correlates with a change in the runtime version, you may be able to mitigate this error by temporarily rolling back to the previous runtime version. For more information, see https://docs.aws.amazon.com/lambda/latest/dg/runtimes-update.html
[ERROR] FileNotFoundError: [Errno 2] No such file or directory: '/tmp/lithops-root/func.pickle'
Traceback (most recent call last):
  File "/function/handler/entry_point.py", line 42, in lambda_handler
    function_handler(event)
  File "/function/lithops/worker/handler.py", line 71, in function_handler
    job = create_job(payload)
  File "/function/lithops/worker/handler.py", line 61, in create_job
    job.func = get_function_and_modules(job, internal_storage)
  File "/function/lithops/worker/utils.py", line 50, in get_function_and_modules
    with open(func_path, "rb") as f:

The func.pickle is not being found after extend_runtime. I haven't tried in other runtimes, but at least in AWS Lambda, it's not working.

I tried debugging and everything seems to work properly. Somehow, the func.pickle doesn't end up in the new re-deployed image.

The customized runtime feature is not yet working. Do not use customized_runtime: True parameter. This feature can be useful in a limited number edge use cases. In general, you don't need to use it.

I need to use it for a runtime that has many dependencies. Eliminating the cost of serialization and uploading of code to S3 would make my calls faster.

For this you can create your own custom runtime in advance following the instructions here: https://github.com/lithops-cloud/lithops/tree/master/runtime

That is what I followed to create the runtime. I need to add the customized_runtime: True on top of it to avoid the communication costs of serializing the code.

However, if you tell me that customized_runtime: True is not yet working, I'll have to wait then.

Exactly, what I need is the runtime to include the function.

I just tested the feature and it works properly with master branch. Make sure you have your local fork updated.

Note that currently with this feature you are saving the time to download the function from the storage backend. You are not saving the time of pickling/unpickling the function since it is always pickled.

I got the last update and changed the lithops_config:

aws_lambda:
    runtime_include_function: True
    region_name: eu-west-1
    ...

I followed the same steps described in my first comment, but I keep getting the same error.
Lithops output:

2024-02-19 09:37:44,246 [INFO] config.py:139 -- Lithops v3.1.2.dev0 - Python3.11
2024-02-19 09:37:44,592 [INFO] aws_s3.py:68 -- S3 client created - Region: eu-west-1
2024-02-19 09:37:46,986 [INFO] aws_lambda.py:106 -- AWS Lambda client created - Region: eu-west-1
2024-02-19 09:37:51,952 [INFO] invokers.py:107 -- ExecutorID 53eba5-0 | JobID A000 - Selected Runtime: helloworld - 128MB
2024-02-19 09:37:57,306 [INFO] invokers.py:489 -- Creating runtime: helloworld:80150c1ce8976752d751713988987e93, memory: 128MB
2024-02-19 09:38:42,962 [INFO] aws_lambda.py:344 -- Building runtime helloworld:80150c1ce8976752d751713988987e93 from /tmp/lithops-pepe/custom-runtime/80150c1ce8976752d751713988987e93/Dockerfile
WARNING! Your password will be stored unencrypted in /home/pepe/.docker/config.json.
Configure a credential helper to remove this warning. See
https://docs.docker.com/engine/reference/commandline/login/#credentials-store
2024-02-19 09:40:00,738 [INFO] aws_lambda.py:457 -- Deploying runtime: helloworld:80150c1ce8976752d751713988987e93 - Memory: 128 Timeout: 180
2024-02-19 09:41:32,363 [INFO] invokers.py:172 -- ExecutorID 53eba5-0 | JobID A000 - Starting function invocation: hello() - Total: 1 activations
2024-02-19 09:41:40,757 [INFO] invokers.py:208 -- ExecutorID 53eba5-0 | JobID A000 - View execution logs at /tmp/lithops-pepe/logs/53eba5-0-A000.log
2024-02-19 09:41:40,786 [INFO] aws_s3.py:68 -- S3 client created - Region: eu-west-1

Cloudwatch logs:

[ERROR] FileNotFoundError: [Errno 2] No such file or directory: '/tmp/lithops-root/func.pickle'
Traceback (most recent call last):
  File "/function/handler/entry_point.py", line 42, in lambda_handler
    function_handler(event)
  File "/function/lithops/worker/handler.py", line 71, in function_handler
    job = create_job(payload)
  File "/function/lithops/worker/handler.py", line 61, in create_job
    job.func = get_function_and_modules(job, internal_storage)
  File "/function/lithops/worker/utils.py", line 50, in get_function_and_modules
    with open(func_path, "rb") as f:

How did you test it?

Can you verify that the image contains the func.pickle by running in your local machine:
docker run -it helloworld:80150c1ce8976752d751713988987e93 bash

If so, can you verify in cloudwatch that the runtime being invoked uses the correct tag (80150c1ce8976752d751713988987e93)?

Can you post complete lithops debug logs?

With this last change, it works. It probably was starting up the container by cleaning /tmp directory, that's why the func.pickle wasn't there. Good catch!

Great!