Chrome not reachable after the first execution in lambda

Question

Chrome not reachable after the first execution in lambda

karthiks3000 opened this issue 2 years ago · 2 comments

First off thank you for the great article on how to run selenium in aws lambda. I followed the instructions outlined here and was able to get the lambda to run once.
After the first execution, each subsequent execution of the lambda fails with the error -

errorMessage": "Message: chrome not reachable

If I try again after 20mins or so I'm able to get another successful run which are then followed by unsuccessful runs.

Here is my code -

def handler(event=None, context=None):
    chrome_options = webdriver.ChromeOptions()
    chrome_options.binary_location = "/opt/chrome/chrome"
    chrome_options.add_argument("--headless")
    chrome_options.add_argument("--no-sandbox")
    chrome_options.add_argument("--disable-dev-shm-usage")
    chrome_options.add_argument("--disable-gpu")
    chrome_options.add_argument("--disable-dev-tools")
    chrome_options.add_argument("--no-zygote")
    chrome_options.add_argument("--single-process")
    chrome_options.add_argument("window-size=2560x1440")
    chrome_options.add_argument("--user-data-dir=/tmp/chrome-user-data")
    chrome_options.add_argument("--remote-debugging-port=9222")
    #chrome_options.add_argument("--data-path=/tmp/chrome-user-data")
    #chrome_options.add_argument("--disk-cache-dir=/tmp/chrome-user-data")
    chrome = webdriver.Chrome("/opt/chromedriver", options=chrome_options)

   
    chrome.get("https://www.google.ca/maps/place/Sushi+Maki/@49.2807611,-123.1270736,17z/data=!3m1!5s0x5486717f0a6c35a5:0x2cba7a9c7cdc2aeb!4m7!3m6!1s0x548673d541b27b4d:0x8cdeffd73083c283!8m2!3d49.2807611!4d-123.1248849!9m1!1b1")
    total_reviews = 0
    try:
            time.sleep(1)
            review_count_ele = chrome.find_element(By.XPATH, '//div[@class=\'jANrlb\']')
            total_reviews = int(review_count_ele.text.split('\n')[1].split(' reviews')[0].replace(',', ''))
            print(f'Total reviews to be found: {total_reviews}')

    except Exception as e:
        print(e)
    
    MAX_TIME_OUT = 5
    timeout = 0
    while timeout < MAX_TIME_OUT:
        try:
            menu_bt = chrome.find_element(By.XPATH, '//button[@data-value=\'Sort\']')
            menu_bt.click()
            print('sort menu clicked')
            break
        except Exception as e:
            print('cant find menu button')
            timeout +=1
            time.sleep(1)
    
    timeout = 0
    time.sleep(1)
    SORT_MENU_ITEM_XPATH = '//li[@role=\'menuitemradio\']'
    while timeout < MAX_TIME_OUT:
        try:
            xpath = f'{SORT_MENU_ITEM_XPATH}[2]'
            recent_rating_bt = chrome.find_element(By.XPATH, xpath)
            recent_rating_bt.click()
            print(f'menu item clicked')
            break
        except Exception as e:
            print('cant find menu item')
            timeout +=1
            time.sleep(1)
    
    time.sleep(1)
    reviews_list = chrome.find_elements(By.XPATH, "//div[@data-review-id][@aria-label]")
    reviews_found = len(reviews_list)
    chrome.close()
    chrome.quit()
    return {
        "statusCode": 200,
        "body": json.dumps(
            {
                "message": reviews_found,
            }
        ),
    }

I've tried to increase the memory size to 3008 and that had no impact either.
This works perfectly on my local (sam local invoke).

What's also weird is that it fails at different points on different runs. I suspect chrome is crashing but I have no clue why or how to get around this.

Any suggestions would be greatly appreciated!

Full error message -

{
"errorMessage": "Message: chrome not reachable\n (Session info: headless chrome=103.0.5060.0)\nStacktrace:\n#0 0x563aa11ab759 \n#1 0x563aa1144cf3 \n#2 0x563aa0f23ca7 \n#3 0x563aa0f14b94 \n#4 0x563aa0f156ef \n#5 0x563aa0f17612 \n#6 0x563aa0f0fa78 \n#7 0x563aa0f252e3 \n#8 0x563aa0f8a1ce \n#9 0x563aa0f77ef3 \n#10 0x563aa0f4e27b \n#11 0x563aa0f4f455 \n#12 0x563aa1173870 \n#13 0x563aa11858b0 \n#14 0x563aa11855bc \n#15 0x563aa1185e32 \n#16 0x563aa1174b9b \n#17 0x563aa11860b6 \n#18 0x563aa11664dd \n#19 0x563aa119d888 \n#20 0x563aa119da12 \n#21 0x563aa11b7c2e \n#22 0x7f6436eec44b \n#23 0x7f6435a0356f \n",
"errorType": "WebDriverException",
"requestId": "a7eb93bc-0740-4d96-b665-d720d3cca6bf",
"stackTrace": [
" File "/var/task/app.py", line 93, in handler\n reviews_list = chrome.find_elements(By.XPATH, "//div[@data-review-id][@aria-label]")\n",
" File "/var/lang/lib/python3.9/site-packages/selenium/webdriver/remote/webdriver.py", line 888, in find_elements\n return self.execute(Command.FIND_ELEMENTS, {\n",
" File "/var/lang/lib/python3.9/site-packages/selenium/webdriver/remote/webdriver.py", line 428, in execute\n self.error_handler.check_response(response)\n",
" File "/var/lang/lib/python3.9/site-packages/selenium/webdriver/remote/errorhandler.py", line 243, in check_response\n raise exception_class(message, screen, stacktrace)\n"
]
}

Answer 1 · 2022-08-20T19:04:04.000Z

Finally figured out what was going on. Apparently all the lambda executions share a single volume mounted in /tmp with a default size of 512mb that was getting filled up with the data produced by the crawler. Increasing the size to 3gb fixed it for me. I also implemented a workaround to clean up the data after the execution run. In case this helps anyone, here is the code for the driver initialization -

def __get_driver(self):
        self._tmp_folder = '/tmp/{}'.format(uuid.uuid4())
        if not os.path.exists(self._tmp_folder):
            os.makedirs(self._tmp_folder)

        if not os.path.exists(self._tmp_folder + '/chrome-user-data'):
            os.makedirs(self._tmp_folder + '/chrome-user-data')

        if not os.path.exists(self._tmp_folder + '/data-path'):
            os.makedirs(self._tmp_folder + '/data-path')

        if not os.path.exists(self._tmp_folder + '/cache-dir'):
            os.makedirs(self._tmp_folder + '/cache-dir')

        chrome_options = webdriver.ChromeOptions()
        chrome_options.binary_location = "/opt/chrome/chrome"
        chrome_options.add_argument("--headless")
        chrome_options.add_argument("--no-sandbox")
        chrome_options.add_argument("--disable-dev-shm-usage")
        chrome_options.add_argument("--disable-gpu")
        chrome_options.add_argument("--disable-dev-tools")
        chrome_options.add_argument("--no-zygote")
        chrome_options.add_argument("--single-process")
        chrome_options.add_argument("window-size=2560x1440")
        chrome_options.add_argument(f"--user-data-dir={self._tmp_folder}/chrome-user-data")
        chrome_options.add_argument(f"--data-path={self._tmp_folder}/data-path")
        chrome_options.add_argument(f"--disk-cache-dir={self._tmp_folder}/cache-dir")
        chrome_options.add_argument("--remote-debugging-port=9222")
        input_driver = webdriver.Chrome("/opt/chromedriver", options=chrome_options)
        return input_driver

And the clean up code on the exit function -

def __exit__(self, exc_type, exc_value, tb):
        self.print_log('Driver Exit called')
        if exc_type is not None:
            print(exc_type, exc_value, tb)

        self.driver.close()
        self.driver.quit()
        shutil.rmtree(self._tmp_folder)

        return True

Answer 2 · 2022-08-21T18:51:39.000Z

Finally figured out what was going on. Apparently all the lambda executions share a single volume mounted in /tmp with a default size of 512mb that was getting filled up with the data produced by the crawler. Increasing the size to 3gb fixed it for me. I also implemented a workaround to clean up the data after the execution run. In case this helps anyone, here is the code for the driver initialization -

Glad you were able to find a solution. You are correct, SOMETIMES Lambda reuses the container it created for an invocation so it's a good practice to clean /tmp folder. This is called warm-start where Lambda reuses an existing container to reduce latency during invocations and reduce several seconds delay that is experienced when invoking a Lambda function for the first time.

**You can confirm this with the below experiment. **

Testing multiple consecutive invocations

I ran a simple Python program to write the current time stamp to a file in /tmp

import json
import os
import time

def lambda_handler(event, context):
    time_stamp = time.strftime("%H%M%S")
    
    with open(f"/tmp/{time_stamp}.txt", "w") as f:
        f.write(f"time_stamp = {time_stamp}")
        f.close()
        
    with open(f"/tmp/timestamps.txt", "a") as f:
        f.write(time_stamp)
        f.write("\n")
        f.close()
    
    print("Contents of tmp folder")
    os.system('ls -la /tmp')
    
    print("Contents of file")
    os.system("cat /tmp/timestamps.txt")
    
    return {
        'statusCode': 200,
        'body': json.dumps('Hello from Lambda!')
    }

This resulted in the following