Scrapyd running on a remote machine causes UI links to be broken.
Opened this issue · 13 comments
Describe the bug
If the ScrapyD Server(s) are running on a remote host (on same VPN) and the ScrapyDWeb is running on a seperate node then the links to Logs and Items become broken by design.
Example:
ScrapyD Server is ruinning at 189.09.09.90:6800
on a AWS EC2 01 and the ScrapyDWeb is running at 89.09.09.80:80
then in the config I will provide 189.09.09.90:6800
as the scrapyd server location and this will cause the links to be rendered as 189.09.09.90:6800/logs/proj/job/2019-09-30T11_01_23.log
which becomes inaccessible for the browser however 189.09.09.90:6800
can be exposed via reverse proxy to abc.domain.com
and then abc.domain.com/logs/proj/job/2019-09-30T11_01_23.log
will be accessible.
Possible solution would be allow an alias for each server which can be used for generating the links.
What if adding abc.domain.com:80
as the Scrapyd server?
@my8100
sorry for the late reply
adding abc.domain.com:80
but each request now goes through web while it could have reached locally.
let me explain my setup
- We have a cluster of scrapyd.
- We have scrapydweb instance connected to all the servers locally.
- Everything is running locally and for security, only scrapydweb is exposed to the internet.
Ideally scrapydweb should fetch and display all the urls from scrapyd regardless of the type of data.
so instead of directly opening scrapyd-server-001.local:6800/logs/proj/job/2019-09-30T11_01_23.log
it must open scrapydweb.domain.com/scrapyd-server-001.local/logs/proj/job/2019-09-30T11_01_23.log
basically it would internally redirect the requests to the scrapyd servers.
OK, it may be supported in a future release.
But for now, you can view the first 100 and last 100 lines of the log in the Stats page.
Hi, I would like to promote this feature as well. It will help to redirect links in the cloud-based setups as well.
@Rohithzr @sergiigladchuk
The requested feature is supported in PR #128
Please give it a try and offer your feedback, thanks.
- Stop Scrapydweb.
- Execute
pip install --upgrade git+https://github.com/my8100/scrapydweb.git
to get the latest code. - Add the content below in the existed file scrapydweb_settings_v10.py.
scrapydweb/scrapydweb/default_settings.py
Lines 106 to 115 in 12c4892
- Update the option
SCRAPYD_SERVERS_PUBLIC_URLS
accordingly. - Restart Scrapydweb.
For anyone trying to make this work with nginx, a "subfolder" config (mydomain.com/scrapy
) didn't work for me for some reason.
I had success with a subdomain config like this (scrapy.mydomain.com
):
server {
listen 443 ssl http2;
listen [::]:443 ssl http2;
server_name scrapy.*;
include /config/nginx/ssl.conf;
client_max_body_size 0;
location / {
include /config/nginx/proxy.conf;
include /config/nginx/resolver.conf;
set $upstream_app scrapydweb;
set $upstream_port 5000;
set $upstream_proto http;
proxy_pass $upstream_proto://$upstream_app:$upstream_port;
}
}
Its not working for me, I do not have the custom domain, hence using IPv4 of the server.
I have deployed Scrapyd and Scrapydweb on the same server and running web with no reverse proxy, i.e. exposed port on which Scrapydweb is running.
While accessing the url (server_ipv4:port) from other computer, everything seems to work fine, however when I go to items section, I get Oops! Something went wrong. error. Also pip install logparser is also showing even after configuring it to run with Scrapydweb.
@ritikkumarsahu
Have you set up items_dir for Scrapyd first?
It is disabled by default.
You may need to check the Scrapyd web UI as it is returning the 404 status code for the items page.
https://scrapyd.readthedocs.io/en/stable/config.html#items-dir
I have used mongodb for the items pipeline. I am not sure how to set mongodb for ScrapydWeb
Only need to update the config of Scrapyd if you want to visit the items page.
Can you visit these url directly?
http://your-scrapyd-server:port/
http://your-scrapyd-server:port/items/
Maybe the items page is available only when you are using the simple json instead of database?
I can Visit http://your-scrapyd-server:port/ but not the http://your-scrapyd-server:port/items/ page
Can This be a feature request then?
Scrapydweb can show the job stats efficiently when working with logparser.
It makes more sense to check your data with a modern database client with GUI.
BTW, PR is welcome.
https://github.com/my8100/logparser?tab=readme-ov-file#to-work-with-scrapydweb-for-visualization