The base Azure Function image does not contain the necessary chromium packages to run selenium webdriver. This project creates a custom docker image with the required libraries such that it can be run as Azure Function.
- For more details, see blog https://towardsdatascience.com/how-to-create-a-selenium-web-scraper-in-azure-functions-f156fd074503
- Docker desktop
- Azure Container Registry
- Azure CLI
- Azure Core Tools version 2.x
- (optional) Visual Studio Code
Run the following commands that installs chromium, chrome driver and selenium on top of the Azure Function base image:
$acr_id = "<<your acr>>.azurecr.io"
docker login $acr_id -u <<your username>> -p <<your password>>
docker build --tag $acr_id/selenium .
docker push $acr_id/selenium:latest
Run the following commands:
$rg = "<<your resource group name>>"
$loc = "<<your location>>"
$plan = "<<your azure function plan P1v2>>"
$stor = "<<your storage account adhering to function>>"
$fun = "<<your azure function name>>"
$acr_id = "<<your acr>>.azurecr.io"
az group create -n $rg -l $loc
az storage account create -n $stor -g $rg --sku Standard_LRS
az appservice plan create --name $plan --resource-group $rg --sku P1v2 --is-linux
az functionapp create --resource-group $rg --os-type Linux --plan $plan --deployment-container-image-name $acr_id/selenium:latest --name $fun --storage-account $stor
Test the Function in the portal or in your browser. The following code in init.py will return all URLs in the following webpage:
import azure.functions as func
from selenium import webdriver
def main(req: func.HttpRequest) -> func.HttpResponse:
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
driver = webdriver.Chrome("/usr/local/bin/chromedriver", chrome_options=chrome_options)
driver.get('http://www.ubuntu.com/')
links = driver.find_elements_by_tag_name("a")