This project can be used to get Amazon location cookies from specific Amazon Zip-Code
and country-specific domains like .de
, .co.uk
, etc.
It will be very helpful when you are using random geolocation proxies for scraping data from Amazon because Amazon returns content based on user IP.
Tested location at the moment:
- US (30322)
- ES (28010)
- UK (E1 6AN)
- DE (80686)
- IT (20162)
Install pre-commit hooks to ensure code quality checks and style checks
$ make install_hooks
Then see Configuration
section
You can also use these commands during dev process:
-
to run mypy checks
$ make types
Replace .env.example
with real .env
, changing placeholders
SECRET_KEY=changeme
SCRAPYRT_URL=http://scrapyrt:7800/crawl.json
Setup and activate a python3 virtualenv via your preferred method. e.g. and install production requirements:
$ make ve
For remove virtualenv:
$ make clean
Run spider locally:
$ scrapy crawl amazon:location-session -a country=US -a zip_code=30322
Run using local ScrapyRT service:
$ scrapyrt --ip 0.0.0.0 --port 7800
$ curl http://0.0.0.0:7800/crawl.json?start_requests=1&spider_name=amazon:location-session&crawl_args={"zip_code":"30332","country":"US"}
Run docker containers:
$ make docker_up
Check extracted amazon location cookies from python script:
import re
from time import sleep
from typing import Dict
import requests
API_URL = "http://127.0.0.1:8000/api/v1/cookies?zip_code={zip_code}&country_code={country_code}"
HEADERS = {"user-agent": "user-agent"}
COUNTRY_CONFIG = {
"US": {"zip_code": "30322", "amazon_url": "https://amazon.com"},
"ES": {"zip_code": "28010", "amazon_url": "https://amazon.es"},
"UK": {"zip_code": "E1 6AN", "amazon_url": "https://amazon.co.uk"},
"DE": {"zip_code": "80686", "amazon_url": "https://amazon.de"},
"IT": {"zip_code": "20162", "amazon_url": "https://amazon.it"},
}
LOCATION_REGEX = r'(?s)glow-ingress-line2">(.+?)<'
def get_location_cookies(country: str, zip_code: str) -> Dict[str, str]:
"""
Make request to Amazon Location Cookies service for getting location cookies.
"""
api_url = API_URL.format(zip_code=zip_code, country_code=country)
json_data = requests.get(url=api_url).json()
cookies = json_data["data"]["cookies"]
return cookies
def check_location_cookies(amazon_url: str, cookies: Dict[str, str]) -> str:
"""
Make request to country specific Amazon url with location cookies.
"""
amazon_response = requests.get(url=amazon_url, cookies=cookies, headers=HEADERS)
location = re.search(LOCATION_REGEX, amazon_response.text)
return location.group(1).strip()
def main() -> None:
"""
Project entry point.
"""
for country in COUNTRY_CONFIG:
print("Check cookies for country: ", country)
amazon_url = COUNTRY_CONFIG[country]["amazon_url"]
zip_code = COUNTRY_CONFIG[country]["zip_code"]
# Extract cookies via Amazon Location Service.
cookies = get_location_cookies(country=country, zip_code=zip_code)
print("Got Amazon cookies: ", cookies)
# Check response using location cookies.
response = check_location_cookies(amazon_url=amazon_url, cookies=cookies)
print("Amazon response: ", response)
sleep(5)