dgtlmoon/changedetection.io

Monitoring Issue - Error 403

Closed this issue · 2 comments

v0.46.04

I am trying to monitor yoox.com page with the "Re-stock & Price detection for single product pages" feature. A few weeks ago, everything was working perfectly. After some time, I noticed that the monitoring stopped working and I started receiving an Error - 403 (Access Denied). This error occurs on all pages from yoox.com.

After some investigation, I realized that sometimes it works, but most of the time it doesn't. Below is a snippet of logs for both successful and failed attempts. The same result occurs when using both 'Basic fast Plaintext/HTTP Client' and 'Playwright Chromium/JavaScript via "ws://playwright-chrome:3000/".

image

Any ideas?

Ok:

172.21.0.1 - - [19/Sep/2024 21:31:02] "GET /static/js/toggle-theme.js HTTP/1.1" 200 1152 0.001810
2024-09-19 21:31:02.913 | INFO     | changedetectionio.update_worker:run:255 - Processing watch UUID 61711ceb-f5ae-4a5b-95a7-ff1d5ee545b2 Priority 1 URL https://www.yoox.com/de/17919386NX/item#sts=dreambox80&sizeId=7&cod10=17919386NX
Playwright console: Watch URL: https://www.yoox.com/de/17919386NX/item#sts=dreambox80&sizeId=7&cod10=17919386NX warning: <link rel=preload> has an invalid `href` value []
Playwright console: Watch URL: https://www.yoox.com/de/17919386NX/item#sts=dreambox80&sizeId=7&cod10=17919386NX error: Failed to load resource: net::ERR_FAILED []
Playwright console: Watch URL: https://www.yoox.com/de/17919386NX/item#sts=dreambox80&sizeId=7&cod10=17919386NX error: Failed to load resource: net::ERR_FAILED []
Playwright console: Watch URL: https://www.yoox.com/de/17919386NX/item#sts=dreambox80&sizeId=7&cod10=17919386NX error: Failed to load resource: net::ERR_FAILED []
Playwright console: Watch URL: https://www.yoox.com/de/17919386NX/item#sts=dreambox80&sizeId=7&cod10=17919386NX error: Failed to load resource: net::ERR_FAILED []
Playwright console: Watch URL: https://www.yoox.com/de/17919386NX/item#sts=dreambox80&sizeId=7&cod10=17919386NX error: Failed to load resource: net::ERR_FAILED []
Playwright console: Watch URL: https://www.yoox.com/de/17919386NX/item#sts=dreambox80&sizeId=7&cod10=17919386NX error: Failed to load resource: net::ERR_FAILED []
Playwright console: Watch URL: https://www.yoox.com/de/17919386NX/item#sts=dreambox80&sizeId=7&cod10=17919386NX error: Failed to load resource: net::ERR_FAILED []
Playwright console: Watch URL: https://www.yoox.com/de/17919386NX/item#sts=dreambox80&sizeId=7&cod10=17919386NX warning: [.WebGL-0x1c44031f7800]GL Driver Message (OpenGL, Performance, GL_CLOSE_PATH_NV, High): GPU stall due to ReadPixels []
2024-09-19 21:31:07.540 | DEBUG    | changedetectionio.blueprint.browser_steps.browser_steps:action_goto_url:91 - Time to goto URL 3.54s
Playwright console: Watch URL: https://www.yoox.com/de/17919386NX/item#sts=dreambox80&sizeId=7&cod10=17919386NX info: WebGPU is experimental on this platform. See https://github.com/gpuweb/gpuweb/wiki/Implementation-Status#implementation-status []
Playwright console: Watch URL: https://www.yoox.com/de/17919386NX/item#sts=dreambox80&sizeId=7&cod10=17919386NX warning: Failed to create WebGPU Context Provider []
Playwright console: Watch URL: https://www.yoox.com/de/17919386NX/item#sts=dreambox80&sizeId=7&cod10=17919386NX warning: The resource https://fonts.googleapis.com/css?family=Montserrat:300,500,600,700|Playfair+Display:700&display=swap was preloaded using link preload but not used within a few seconds from the window's load event. Please make sure it has an appropriate `as` value and it is preloaded intentionally. []
Playwright console: Watch URL: https://www.yoox.com/de/17919386NX/item#sts=dreambox80&sizeId=7&cod10=17919386NX warning: The resource https://s.go-mpulse.net/boomerang/NAHXZ-NFM72-8XBWS-RN8VJ-USRC3 was preloaded using link preload but not used within a few seconds from the window's load event. Please make sure it has an appropriate `as` value and it is preloaded intentionally. []
Playwright console: Watch URL: https://www.yoox.com/de/17919386NX/item#sts=dreambox80&sizeId=7&cod10=17919386NX log: Scanning div,span,form,table,tbody,tr,td,a,p,ul,li,h1,h2,h3,h4,header,footer,section,article,aside,details,main,nav,section,summary [<JSHandle preview=Scanning div,span,form,table,tbody,tr,td,a,p,ul,li,h1,h2,h3,h4,header,footer,section,article,aside,details,main,nav,section,summary>]
Playwright console: Watch URL: https://www.yoox.com/de/17919386NX/item#sts=dreambox80&sizeId=7&cod10=17919386NX log: Scanning %ELEMENTS% [<JSHandle preview=Scanning %ELEMENTS%>]
Playwright console: Watch URL: https://www.yoox.com/de/17919386NX/item#sts=dreambox80&sizeId=7&cod10=17919386NX log: Returning 'Possibly in stock' - cant' find any useful matching text [<JSHandle preview=Returning 'Possibly in stock' - cant' find any useful matching text>]
2024-09-19 21:31:22.973 | DEBUG    | changedetectionio.processors.restock_diff.processor:get_itemprop_availability:58 - Using jsonpath to find price/availability/etc
2024-09-19 21:31:23.009 | DEBUG    | changedetectionio.processors.restock_diff.processor:get_itemprop_availability:90 - Alternatively digging through OpenGraph properties for restock/price info..
2024-09-19 21:31:23.019 | DEBUG    | changedetectionio.processors.restock_diff.processor:run_changedetection:213 - Watch UUID 61711ceb-f5ae-4a5b-95a7-ff1d5ee545b2 restock check - Previous MD5: False, Fetched MD5 135393573c5725c37ab3d101900476b2
2024-09-19 21:31:23.020 | DEBUG    | changedetectionio.processors.restock_diff.processor:run_changedetection:236 - 61711ceb-f5ae-4a5b-95a7-ff1d5ee545b2 - Change was detected, 'price_change_max' is '' 'price_change_min' is '', price from website is '183'.
2024-09-19 21:31:23.020 | DEBUG    | changedetectionio.processors.restock_diff.processor:run_changedetection:243 - 61711ceb-f5ae-4a5b-95a7-ff1d5ee545b2 after float conversion - Min limit: 'None' Max limit: 'None' Price: '183.0'
2024-09-19 21:31:23.096 | INFO     | changedetectionio.update_worker:run:508 - UUID: 61711ceb-f5ae-4a5b-95a7-ff1d5ee545b2 Extract <title> updated title to 'TORY BURCH  | Ballerinas Beige Damen | YOOX
2024-09-19 21:31:24.542 | DEBUG    | changedetectionio.model.Watch:history:177 - Reading watch history index for 61711ceb-f5ae-4a5b-95a7-ff1d5ee545b2
2024-09-19 21:31:24.543 | DEBUG    | changedetectionio.update_worker:run:576 - Watch 61711ceb-f5ae-4a5b-95a7-ff1d5ee545b2 done in 21.63s
2024-09-19 21:31:37.878 | INFO     | changedetectionio.store:sync_to_json:383 - Saving JSON..
2024-09-19T21:30:41.214Z browserless:system Chrome launched 4319ms
2024-09-19T21:30:41.250Z browserless:chrome-helper Setting up file:// protocol request rejection
2024-09-19T21:30:41.250Z browserless:chrome-helper Setting up page for ad-blocking
2024-09-19T21:30:41.251Z browserless:chrome-helper Setting up file:// protocol request rejection
2024-09-19T21:30:41.251Z browserless:chrome-helper Setting up page for ad-blocking
2024-09-19T21:30:41.252Z browserless:chrome-helper Setting up file:// protocol request rejection
2024-09-19T21:30:41.253Z browserless:chrome-helper Setting up page for ad-blocking
2024-09-19T21:31:03.817Z browserless:job JFB0KNSUG21JOHVC6S1EMLBZY3799TZB: /: Inbound WebSocket request.
2024-09-19T21:31:03.838Z browserless:hardware Checking overload status: CPU 2% Memory 26%
2024-09-19T21:31:03.841Z browserless:job JFB0KNSUG21JOHVC6S1EMLBZY3799TZB: Adding new job to queue.
2024-09-19T21:31:03.841Z browserless:server Starting new job
2024-09-19T21:31:03.842Z browserless:system Waiting pre-booted chrome instance
2024-09-19T21:31:03.842Z browserless:job JFB0KNSUG21JOHVC6S1EMLBZY3799TZB: Getting browser.
2024-09-19T21:31:03.842Z browserless:system Got chrome instance
2024-09-19T21:31:03.842Z browserless:job JFB0KNSUG21JOHVC6S1EMLBZY3799TZB: Starting session.
2024-09-19T21:31:03.843Z browserless:job JFB0KNSUG21JOHVC6S1EMLBZY3799TZB: Proxying request to /devtools/browser route: ws://127.0.0.1:35312/devtools/browser/6429440b-e2db-47fb-b1e9-5f091ab2f15f.
2024-09-19T21:31:04.026Z browserless:chrome-helper Setting up page Unknown
2024-09-19T21:31:04.026Z browserless:chrome-helper Injecting download dir "/usr/src/app/workspace"
2024-09-19T21:31:04.030Z browserless:chrome-helper Setting up file:// protocol request rejection
2024-09-19T21:31:04.030Z browserless:chrome-helper Setting up page for ad-blocking
2024-09-19T21:31:22.663Z browserless:server JFB0KNSUG21JOHVC6S1EMLBZY3799TZB: Recording successful stat and cleaning up.
2024-09-19T21:31:22.668Z browserless:job JFB0KNSUG21JOHVC6S1EMLBZY3799TZB: Cleaning up job
2024-09-19T21:31:22.668Z browserless:job JFB0KNSUG21JOHVC6S1EMLBZY3799TZB: Browser not needed, closing
2024-09-19T21:31:22.669Z browserless:chrome-helper Shutting down browser with close command
2024-09-19T21:31:22.669Z browserless:system Adding back Chrome swarm
2024-09-19T21:31:22.670Z browserless:job JFB0KNSUG21JOHVC6S1EMLBZY3799TZB: Browser cleanup complete.
2024-09-19T21:31:22.670Z browserless:server Current workload complete.
2024-09-19T21:31:22.670Z browserless:chrome-helper Sending SIGKILL signal to browser process 21
2024-09-19T21:31:22.692Z browserless:chrome-helper Removing temp data-dir /tmp/browserless-data-dir-UdaUTw
2024-09-19T21:31:22.706Z browserless:chrome-helper Launching Chrome with args: {
  "args": [
    "--no-sandbox",
    "--enable-logging",
    "--v1=1",
    "--disable-dev-shm-usage",
    "--no-first-run",
    "--remote-debugging-port=34855",
    "--user-data-dir=/tmp/browserless-data-dir-SU0NEz"
  ],
  "blockAds": true,
  "headless": "new",
  "ignoreDefaultArgs": false,
  "ignoreHTTPSErrors": false,
  "pauseOnConnect": false,
  "userDataDir": "/tmp/browserless-data-dir-SU0NEz",
  "playwright": false,
  "stealth": true,
  "meta": null,
  "executablePath": "/usr/bin/google-chrome",
  "handleSIGINT": false,
  "handleSIGTERM": false,
  "handleSIGHUP": false
}
2024-09-19T21:31:22.795Z browserless:chrome-helper Browser process 6429440b-e2db-47fb-b1e9-5f091ab2f15f has closed, cleaning up.
2024-09-19T21:31:22.796Z browserless:chrome-helper Garbage collecting and removing listeners
2024-09-19T21:31:22.936Z browserless:chrome-helper Temp dir /tmp/browserless-data-dir-UdaUTw removed successfully

NOK

2024-09-19 21:35:20.732 | INFO     | changedetectionio.update_worker:run:255 - Processing watch UUID 61711ceb-f5ae-4a5b-95a7-ff1d5ee545b2 Priority 1 URL https://www.yoox.com/de/17919386NX/item#sts=dreambox80&sizeId=7&cod10=17919386NX
Playwright console: Watch URL: https://www.yoox.com/de/17919386NX/item#sts=dreambox80&sizeId=7&cod10=17919386NX error: Failed to load resource: the server responded with a status of 403 () []
2024-09-19 21:35:21.516 | DEBUG    | changedetectionio.blueprint.browser_steps.browser_steps:action_goto_url:91 - Time to goto URL 0.19s
2024-09-19 21:35:28.673 | DEBUG    | changedetectionio.update_worker:run:576 - Watch 61711ceb-f5ae-4a5b-95a7-ff1d5ee545b2 done in 7.94s
2024-09-19 21:35:37.993 | INFO     | changedetectionio.store:sync_to_json:383 - Saving JSON..
2024-09-19T21:35:21.166Z browserless:job J9TBX69RQ9F2ARADDK6SRUMMB9L4905A: /: Inbound WebSocket request.
2024-09-19T21:35:21.173Z browserless:hardware Checking overload status: CPU 4% Memory 26%
2024-09-19T21:35:21.173Z browserless:job J9TBX69RQ9F2ARADDK6SRUMMB9L4905A: Adding new job to queue.
2024-09-19T21:35:21.173Z browserless:server Starting new job
2024-09-19T21:35:21.174Z browserless:system Waiting pre-booted chrome instance
2024-09-19T21:35:21.174Z browserless:job J9TBX69RQ9F2ARADDK6SRUMMB9L4905A: Getting browser.
2024-09-19T21:35:21.174Z browserless:system Got chrome instance
2024-09-19T21:35:21.174Z browserless:job J9TBX69RQ9F2ARADDK6SRUMMB9L4905A: Starting session.
2024-09-19T21:35:21.174Z browserless:job J9TBX69RQ9F2ARADDK6SRUMMB9L4905A: Proxying request to /devtools/browser route: ws://127.0.0.1:45700/devtools/browser/18da7ac0-c485-4769-a70c-f2abc082382c.
2024-09-19T21:35:21.340Z browserless:chrome-helper Setting up page Unknown
2024-09-19T21:35:21.340Z browserless:chrome-helper Injecting download dir "/usr/src/app/workspace"
2024-09-19T21:35:21.345Z browserless:chrome-helper Setting up file:// protocol request rejection
2024-09-19T21:35:21.345Z browserless:chrome-helper Setting up page for ad-blocking
2024-09-19T21:35:28.671Z browserless:server Error with inbound socket Error: read ECONNRESET
Error: read ECONNRESET
    at TCP.onStreamRead (node:internal/stream_base_commons:217:20)
2024-09-19T21:35:28.672Z browserless:server J9TBX69RQ9F2ARADDK6SRUMMB9L4905A: Recording failed stat, cleaning up: "undefined"
2024-09-19T21:35:28.672Z browserless:job J9TBX69RQ9F2ARADDK6SRUMMB9L4905A: Cleaning up job
2024-09-19T21:35:28.673Z browserless:job J9TBX69RQ9F2ARADDK6SRUMMB9L4905A: Browser not needed, closing
2024-09-19T21:35:28.673Z browserless:chrome-helper Shutting down browser with close command
2024-09-19T21:35:28.673Z browserless:system Adding back Chrome swarm
2024-09-19T21:35:28.673Z browserless:job J9TBX69RQ9F2ARADDK6SRUMMB9L4905A: Browser cleanup complete.
2024-09-19T21:35:28.673Z browserless:server Current workload complete.
2024-09-19T21:35:28.675Z browserless:chrome-helper Sending SIGKILL signal to browser process 22
2024-09-19T21:35:28.679Z browserless:chrome-helper Removing temp data-dir /tmp/browserless-data-dir-g8Sh7M
2024-09-19T21:35:28.681Z browserless:chrome-helper Launching Chrome with args: {
  "args": [
    "--no-sandbox",
    "--enable-logging",
    "--v1=1",
    "--disable-dev-shm-usage",
    "--no-first-run",
    "--remote-debugging-port=43031",
    "--user-data-dir=/tmp/browserless-data-dir-AcciQR"
  ],
  "blockAds": true,
  "headless": "new",
  "ignoreDefaultArgs": false,
  "ignoreHTTPSErrors": false,
  "pauseOnConnect": false,
  "userDataDir": "/tmp/browserless-data-dir-AcciQR",
  "playwright": false,
  "stealth": true,
  "handleSIGHUP": false
}
2024-09-19T21:35:28.792Z browserless:chrome-helper Browser process 18da7ac0-c485-4769-a70c-f2abc082382c has closed, cleaning up.
2024-09-19T21:35:28.792Z browserless:chrome-helper Garbage collecting and removing listeners
2024-09-19T21:35:29.072Z browserless:chrome-helper Chrome PID: 1007
2024-09-19T21:35:29.078Z browserless:chrome-helper Finding prior pages
2024-09-19T21:35:29.096Z browserless:chrome-helper Found 1 pages
2024-09-19T21:35:29.096Z browserless:chrome-helper Setting up page Unknown
2024-09-19T21:35:29.096Z browserless:chrome-helper Injecting download dir "/usr/src/app/workspace"
2024-09-19T21:35:29.097Z browserless:system Chrome launched 424ms
2024-09-19T21:35:29.098Z browserless:chrome-helper Setting up file:// protocol request rejection
2024-09-19T21:35:29.098Z browserless:chrome-helper Setting up page for ad-blocking
2024-09-19T21:35:29.100Z browserless:chrome-helper Temp dir /tmp/browserless-data-dir-g8Sh7M removed successfully
2024-09-19T21:35:36.885Z browserless:server Health check stats: CPU 5%, MEM: 26%,
2024-09-19T21:35:36.885Z browserless:server Current period usage: {"date":1726781436877,"error":1,"rejected":0,"successful":1,"timedout":0,"totalTime":26322,"maxTime":18823,"minTime":7499,"meanTime":13161,"maxConcurrent":1,"units":2}

yeah so often they will increase their anti-robot ability to detect the automated browser, since we are using an automated browser it will be detected, its very difficult and not much we can do about it right now

Duplicate #2198