Lissy93/web-check

Error on Screenshot , Location and Quality

Opened this issue · 7 comments

Thanks for the help on the docker. It was working and was able to get so many information on the site using this tool. It was awesome!

Today, I saw an added API entry for urlscan.io and has a "screenshot" capabilities.
So far out of the box, I loaded all API keys except for GOOGLE_CLOUD_API_KEY but i got this error on "screenshot"
screenshot

Error Details for screenshot
The screenshot job failed with an error state after 16405 ms. The server responded with the following error:
Failed to launch the browser process!
[146:146:0827/000910.219594:ERROR:browser_main_loop.cc(536)] Failed to open an X11 connection.
[146:146:0827/000910.221268:ERROR:browser_main_loop.cc(1386)] Unable to open X display.
TROUBLESHOOTING: https://pptr.dev/troubleshooting

Also, I went to the urlscan.io API to see if it used up the API - for far did not see anything used up.

Another inquiry, on using the API for urlscan.io. Does it use the API and run on public or private scan? It will be super awesome if the default behavior is to do a scan on PRIVATE to avoid posting the URL on the Urlscan.io page itself.

Error Details for location
The location job failed with an error state after 9 ms. The server responded with the following error:
Failed to fetch

Error Details for Quality
The quality job failed with an error state after 345 ms. The server responded with the following error:
No Data

Hmm, let me look into this. The URL Scan API isn't actually being used - I can push an update that will use that as a fallback, but I took it out a while ago as their API's pricing was quite steep (for the hosted instance, would be fine if you were self-hosting).

Quality Error

For the Quality job, that uses Lighthouse + Page Speed Insights API. This is free, but needs to be enabled from the Cloud console.

  1. Go to console.cloud.google.com, and login
  2. Make sure the right Project (in top left) is selected, then search for " PageSpeed Insights API"
  3. Click Enable API.
  4. If you then click Manage, you should see a graph showing the usage

Server Location Error

This task uses IP API, which is free and doesn't require any auth. If you send a GET request to https://ipapi.co/${ipAddress}/json/ (replace ${ipAddress} with an IP) do you see any results?

I've not actually seen this job fail before, so would be interested to learn more.


Screenshot Error

This works by spinning up a headless instance of Chromium locally, and using Pupeteer to control it. Looks like it couldn't find the Chromium exec in your VM/ system.

If you've got it installed to a non-standard location, you can try setting the CHROME_PATH env var. For example, I'm on Arch, and if I set it to usr/bin/chromium that works for me.


More debugging:
If you pop open the browser console (usually F12) and click the Console tab, you should see some debug info. You can also then check the Network tab, filter by XHR you'll be able to inspect the raw responses from each job. That might give a bit more info than is available via the UI.

Server Location Error
This task uses IP API, which is free and doesn't require any auth. If you send a GET request to https://ipapi.co/${ipAddress}/json/ (replace ${ipAddress} with an IP) do you see any results?

I've not actually seen this job fail before, so would be interested to learn more.

Thanks. this one confirmed to be working.
Saw this one was being blocked from my end which was the main reason. Thanks!


Quality Error

  • Thanks let me try this one out

For the Screenshot,
I have deployed this image on a Synology NAS via portainer using the following Stack details:

`version: "3"

services:
webcheck:
image: lissy93/web-check
ports:
- 8888:3000
environment:
CHROME_PATH: /usr/bin/chromium
restart: on-failure:5`

I can see from console that there seems to be chromium. But not sure why its failing for me.

Screenshot 2023-08-29 170027

Does this mean i need to create a chromium part when I deploy this on the stack?

Thanks again.

Hmmm, I'll need to look into this, Chromium should be installed in the Dockerfile, I thought that it was maybe a permissions thing, but that doesn't seem to be the case.

If anyone has a bit more insight as to why the jobs that use Pupeteer cannot find Chromium when running in Docker, that'd be helpful :)

@Lissy93 While Chromium is installed it does not allow for a sandbox within Docker. screenshot.js needs to be adjusted accordingly:

  try {
      browser = await puppeteer.launch({
      args: [...chromium.args, '--no-sandbox'], // Add --no-sandbox flag
      defaultViewport: { width: 800, height: 600 },
      executablePath: process.env.CHROME_PATH || await chromium.executablePath,
      headless: chromium.headless,
      ignoreHTTPSErrors: true,
      ignoreDefaultArgs: ['--disable-extensions'],
    });

Just added PR #51, however it probably needs a check to only set the flag if run inside docker.

GWnbsp commented

Thanks for the help on the docker. It was working and was able to get so many information on the site using this tool. It was awesome!

Today, I saw an added API entry for urlscan.io and has a "screenshot" capabilities. So far out of the box, I loaded all API keys except for GOOGLE_CLOUD_API_KEY but i got this error on "screenshot" screenshot

Error Details for screenshot The screenshot job failed with an error state after 16405 ms. The server responded with the following error: Failed to launch the browser process! [146:146:0827/000910.219594:ERROR:browser_main_loop.cc(536)] Failed to open an X11 connection. [146:146:0827/000910.221268:ERROR:browser_main_loop.cc(1386)] Unable to open X display. TROUBLESHOOTING: https://pptr.dev/troubleshooting

Also, I went to the urlscan.io API to see if it used up the API - for far did not see anything used up.

Another inquiry, on using the API for urlscan.io. Does it use the API and run on public or private scan? It will be super awesome if the default behavior is to do a scan on PRIVATE to avoid posting the URL on the Urlscan.io page itself.

Error Details for location The location job failed with an error state after 9 ms. The server responded with the following error: Failed to fetch

Error Details for Quality The quality job failed with an error state after 345 ms. The server responded with the following error: No Data

更改dockerfile文件:

设置构建参数,指定要使用的Node.js版本

ARG NODE_VERSION=16

设置构建参数,指定要使用的Debian版本,默认为"bullseye"

ARG DEBIAN_VERSION=bullseye

使用Node.js官方的Docker镜像作为基础镜像,选择特定版本的Node.js和Debian版本

FROM docker.io/library/node:${NODE_VERSION}-${DEBIAN_VERSION}

设置容器的默认shell为Bash,并启用一些选项

SHELL ["/bin/bash", "-euo", "pipefail", "-c"]

安装Chromium浏览器

RUN apt-get update -qq &&
apt-get -qqy install gnupg wget && \

下载并验证Google Chrome的签名密钥

wget --quiet --output-document=- https://dl-ssl.google.com/linux/linux_signing_key.pub | gpg --dearmor > /etc/apt/trusted.gpg.d/google-archive.gpg && \

将Google Chrome的存储库源添加到apt源列表

sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list' && \

使用apt-get安装Chromium浏览器,并在完成后清理apt缓存

apt-get -qqy --no-install-recommends install chromium traceroute &&
rm -f -r /var/lib/apt/lists/*

运行Chromium浏览器的版本命令,并将其输出重定向到/etc/chromium-version文件

RUN /usr/bin/chromium --no-sandbox --version > /etc/chromium-version

设置工作目录为/app

WORKDIR /app

复制package.json和yarn.lock到工作目录

COPY package.json yarn.lock ./

运行yarn install安装依赖,并清除yarn缓存

RUN yarn install &&
rm -rf /app/node_modules/.cache

复制所有文件到工作目录

COPY . .
RUN mkdir /app/data

运行yarn build构建应用

RUN yarn build

暴露容器端口,默认为3000,可通过环境变量PORT修改

EXPOSE ${PORT:-3002}

设置环境变量CHROME_PATH以指定Chromium二进制文件的路径

ENV CHROME_PATH='/usr/bin/chromium'

定义容器启动时执行的命令,启动Node.js应用的server.js

CMD [ "node", "server.js" ]

GWnbsp commented

quality.js 使用axios请求会有问题 改为https将会解决:
const https = require('https');
const middleware = require('./_common/middleware'); // 引入自定义的中间件
const { setupProxy } = require('./_common/setupProxy'); // 引入设置代理的函数

// 处理函数,用于获取指定网页的性能分析数据
const handler = async (url, event, context) => {
const apiKey = process.env.GOOGLE_CLOUD_API_KEY; // 从环境变量中获取Google Cloud API密钥
if (!apiKey) {
throw new Error('未设置API密钥 (GOOGLE_CLOUD_API_KEY)');
}
const getGooglePageSpeedInsights = (url) => new Promise((resolve, reject) => {
const requestOptions = {
agent: setupProxy(), // 使用代理
};
const endpoint = https://www.googleapis.com/pagespeedonline/v5/runPagespeed?url=${encodeURIComponent(url)}&key=${apiKey};
https.get(endpoint, requestOptions, res => {
let data = '';
res.on('data', chunk => {
data += chunk;
});
res.on('end', () => {
resolve(JSON.parse(data));
});
}).on('error', reject);
}
);

const response = await getGooglePageSpeedInsights(url);

return response; // 返回获取的性能分析数据
};

// 导出中间件处理函数
module.exports = middleware(handler);

// 导出处理函数,以便其他模块可以直接使用
module.exports.handler = middleware(handler);

Kf637 commented

I know this is an old issue, but I'm experiencing the same problems. I've tried using the Docker image as well as building my own image from source. This also seems to be happening on the demo page: https://web-check.xyz/.