docker run -e URL1=https://f1.com -e URL2=https://f1.com erelbi/web-similarity-for-soar
[WDM] - Downloading: 19.9kB [00:00, 15.2MB/s]
[WDM] - Downloading: 100%|██████████| 3.10M/3.10M [00:01<00:00, 2.96MB/s]
[WDM] - Downloading: 19.9kB [00:00, 390kB/s]
Similarity: 100.00%
docker run erelbi/web-similarity-for-soar https://f1.com https://f1.com
[WDM] - Downloading: 100%|██████████| 3.10M/3.10M [00:01<00:00, 2.79MB/s]
[WDM] - Downloading: 19.9kB [00:00, 11.9MB/s]
Similarity: 100.00%
Screenshot Comparison Script This script takes screenshots of two web pages and compares the images to determine their similarity using the Structural Similarity Index (SSIM).
Requirements The script requires the following Python libraries:
selenium webdriver_manager pyvirtualdisplay opencv-python scikit-image Code Explanation Imports and Setup python
from selenium import webdriver
from selenium.webdriver.firefox.service import Service as FirefoxService
from webdriver_manager.firefox import GeckoDriverManager
from pyvirtualdisplay import Display
import time
import cv2
import random
import string
import sys
import os
from skimage.metrics import structural_similarity as ssim
- selenium: For automating web browser interactions.
- webdriver_manager: To automatically manage and install the correct WebDriver.
- pyvirtualdisplay: To create a virtual display for running the browser in a headless mode.
- time: For adding delays.
- cv2 (OpenCV): For image processing.
- random and string: For generating random strings.
- sys and os: For command-line arguments and environment variables.
- ssim from skimage.metrics: For calculating the Structural Similarity Index between images. Display Setup python
display = Display(visible=0, size=(1024, 768))
display.start()
Initializes and starts a virtual display. WebDriver Options python
options = webdriver.FirefoxOptions()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
Configures the Firefox WebDriver to run in headless mode with additional options to handle system resource limitations. Screenshot Function python
def take_screenshot(url):
driver = webdriver.Firefox(service=FirefoxService(GeckoDriverManager().install()))
driver.get(url)
screenshot_path = f"screenshot_{generate_random_string(6)}.png"
driver.save_screenshot(screenshot_path)
time.sleep(5)
driver.quit()
return screenshot_path
Takes a screenshot of the given URL and saves it as a PNG file. The filename is generated using a random string. Random String Generator
def generate_random_string(length):
letters = string.ascii_lowercase
return ''.join(random.choice(letters) for i in range(length))
Generates a random string of lowercase letters of the specified length. Image Comparison Function
def compare_images(image1, image2):
image1 = cv2.imread(image1)
image2 = cv2.imread(image2)
image1 = cv2.resize(image1, (300, 300))
image2 = cv2.resize(image2, (300, 300))
gray_image1 = cv2.cvtColor(image1, cv2.COLOR_BGR2GRAY)
gray_image2 = cv2.cvtColor(image2, cv2.COLOR_BGR2GRAY)
similarity_index, _ = ssim(gray_image1, gray_image2, full=True)
return similarity_index
Reads, resizes, and converts the images to grayscale. Calculates and returns the Structural Similarity Index (SSIM) between the two images. Main Function
def main(url1, url2):
screenshot1 = take_screenshot(url1)
screenshot2 = take_screenshot(url2)
similarity = compare_images(screenshot1, screenshot2)
print(f"Similarity: {similarity * 100:.2f}%")
Takes screenshots of the two provided URLs. Compares the screenshots and prints the similarity percentage. Command-Line Execution
if __name__ == "__main__":
url1 = os.getenv('URL1')
url2 = os.getenv('URL2')
if not url1 or not url2:
if len(sys.argv) != 3:
print("Usage: python compare_screenshots.py <url1> <url2>")
sys.exit(1)
url1 = sys.argv[1]
url2 = sys.argv[2]
main(url1, url2)
Allows the script to be run from the command line with two URL arguments. Alternatively, the URLs can be provided through environment variables URL1 and URL2. This script is useful for comparing visual similarities between web pages by taking their screenshots and calculating their SSIM.