/amazon-location-cookies-service

Service for extracting Amazon location cookies based on Amazon Zip-Code

Primary LanguagePython

Amazon Location Cookies

forthebadge made-with-python

Code style: black Checked with mypy Imports: isort Pre-commit: enabled

Description

This project can be used to get Amazon location cookies from specific Amazon Zip-Code and country-specific domains like .de, .co.uk, etc.

It will be very helpful when you are using random geolocation proxies for scraping data from Amazon because Amazon returns content based on user IP.

Tested location at the moment:

  • US (30322)
  • ES (28010)
  • UK (E1 6AN)
  • DE (80686)
  • IT (20162)

Developing

Install pre-commit hooks to ensure code quality checks and style checks

$ make install_hooks

Then see Configuration section

You can also use these commands during dev process:

  • to run mypy checks

    $ make types
    

Configuration

Replace .env.example with real .env, changing placeholders

SECRET_KEY=changeme
SCRAPYRT_URL=http://scrapyrt:7800/crawl.json

Local install

Setup and activate a python3 virtualenv via your preferred method. e.g. and install production requirements:

$ make ve

For remove virtualenv:

$ make clean

Local run

Run spider locally:

$  scrapy crawl amazon:location-session -a country=US -a zip_code=30322

Run using local ScrapyRT service:

$  scrapyrt --ip 0.0.0.0 --port 7800
$  curl http://0.0.0.0:7800/crawl.json?start_requests=1&spider_name=amazon:location-session&crawl_args={"zip_code":"30332","country":"US"}

Run docker containers:

$  make docker_up

Check extracted amazon location cookies from python script:

import re
from time import sleep
from typing import Dict

import requests

API_URL = "http://127.0.0.1:8000/api/v1/cookies?zip_code={zip_code}&country_code={country_code}"

HEADERS = {"user-agent": "user-agent"}

COUNTRY_CONFIG = {
    "US": {"zip_code": "30322", "amazon_url": "https://amazon.com"},
    "ES": {"zip_code": "28010", "amazon_url": "https://amazon.es"},
    "UK": {"zip_code": "E1 6AN", "amazon_url": "https://amazon.co.uk"},
    "DE": {"zip_code": "80686", "amazon_url": "https://amazon.de"},
    "IT": {"zip_code": "20162", "amazon_url": "https://amazon.it"},
}

LOCATION_REGEX = r'(?s)glow-ingress-line2">(.+?)<'


def get_location_cookies(country: str, zip_code: str) -> Dict[str, str]:
    """
    Make request to Amazon Location Cookies service for getting location cookies.
    """
    api_url = API_URL.format(zip_code=zip_code, country_code=country)
    json_data = requests.get(url=api_url).json()
    cookies = json_data["data"]["cookies"]
    return cookies


def check_location_cookies(amazon_url: str, cookies: Dict[str, str]) -> str:
    """
    Make request to country specific Amazon url with location cookies.
    """
    amazon_response = requests.get(url=amazon_url, cookies=cookies, headers=HEADERS)
    location = re.search(LOCATION_REGEX, amazon_response.text)
    return location.group(1).strip()


def main() -> None:
    """
    Project entry point.
    """
    for country in COUNTRY_CONFIG:
        print("Check cookies for country: ", country)
        amazon_url = COUNTRY_CONFIG[country]["amazon_url"]
        zip_code = COUNTRY_CONFIG[country]["zip_code"]

        # Extract cookies via Amazon Location Service.
        cookies = get_location_cookies(country=country, zip_code=zip_code)
        print("Got Amazon cookies: ", cookies)

        # Check response using location cookies.
        response = check_location_cookies(amazon_url=amazon_url, cookies=cookies)
        print("Amazon response: ", response)
        sleep(5)