如何在自定义下载器中启用setting中配置的代理?
suyin-long opened this issue · 3 comments
suyin-long commented
环境
- Windows 11
- feapder 1.8.6
描述
经过测试在没有自定义下载器时,运行代码可以走 setting 中配置的代理,测试代码和 setting 配置如下:
- 测试代码
import feapder class AirSpiderDemo(feapder.AirSpider): def start_requests(self): url = "http://myip.ipip.net/" yield feapder.Request(url, verify=False, method="GET") def download_midware(self, request): request.headers = { "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7", "Accept-Language": "zh-CN,zh;q=0.9", "Cache-Control": "max-age=0", "Proxy-Connection": "keep-alive", "Upgrade-Insecure-Requests": "1", "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36" } return request def parse(self, request, response): print(response.text) if __name__ == "__main__": AirSpiderDemo(thread_count=1).start()
- setting 配置
# # 设置代理 PROXY_EXTRACT_API = '自己的提取代理API' # 代理提取API ,返回的代理分割符为\r\n PROXY_ENABLE = True
这时候运行代码,观察打印结果,打印结果如下:
当前 IP:111.127.99.43 来自于:** 内蒙古 兴安盟 电信
因为我本地IP是深圳的,通过观察打印的结果可知,代理生效了。
然后再测试自定义下载器后,setting 配置的代理是否生效,测试代码如下:
import feapder
import cloudscraper
class AirSpiderDemo(feapder.AirSpider):
def start_requests(self):
url = "http://myip.ipip.net/"
yield feapder.Request(url, verify=False, method="GET")
def download_midware(self, request):
headers = {
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
"Accept-Language": "zh-CN,zh;q=0.9",
"Cache-Control": "max-age=0",
"Proxy-Connection": "keep-alive",
"Upgrade-Insecure-Requests": "1",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36"
}
# 绕过某些网站的反爬
scraper = cloudscraper.create_scraper()
resp = scraper.get(request.url, headers=headers)
response = feapder.Response(resp)
return request, response
def parse(self, request, response):
print(response.text)
if __name__ == "__main__":
AirSpiderDemo(thread_count=1).start()
setting 配置保持不变,运行结果如下:
当前 IP:113.81.233.20 来自于:** 广东 深圳 电信
这个结果显示的是我本地的IP,说明自定义下载器后 setting 中配置的代理不生效!那么该如何使 setting 中配置代理生效呢?
Boris-code commented
这一看就没生效,请求是你自己发起的
想要用框架的代理,可以用下面的写法
from feapder.network.proxy_pool import ProxyPool
proxy_pool = ProxyPool()
proxy = proxy_pool.get_proxy()
feapder版本为 1.8.8
suyin-long commented
亲测,feapder v1.8.8 版本管用。
import feapder
from feapder.network.proxy_pool import ProxyPool
from curl_cffi import requests
class AirSpiderDemo(feapder.AirSpider):
def start_requests(self):
url = "http://myip.ipip.net/"
yield feapder.Request(url, verify=False, method="GET")
def download_midware(self, request):
headers = {
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
"Accept-Language": "zh-CN,zh;q=0.9",
"Cache-Control": "max-age=0",
"Proxy-Connection": "keep-alive",
"Upgrade-Insecure-Requests": "1",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36"
}
# 代理 IP
proxy_pool = ProxyPool()
proxy = proxy_pool.get_proxy()
print(proxy)
# 绕过某些网站的反爬
response = requests.get(request.url, headers=headers, proxies=proxy, impersonate="chrome110")
return request, response
def parse(self, request, response):
print(response.text)
if __name__ == "__main__":
AirSpiderDemo(thread_count=1).start()
打印结果如下:
{'http': 'http://223.109.206.190:8976', 'https': 'http://223.109.206.190:8976'}
当前 IP:1.196.233.223 来自于:** 河南 信阳 电信
备注:这里我使用的是“隧道IP”,所以两个打印的两个IP会有区别,如果使用的是“独享IP”,打印的这两个IP将是一致的。
suyin-long commented
补充:feapder v1.8.6 版本的使用方法
import feapder
from feapder.network.proxy_pool import ProxyPool
from curl_cffi import requests
class AirSpiderDemo(feapder.AirSpider):
def start_requests(self):
url = "http://myip.ipip.net/"
yield feapder.Request(url, verify=False, method="GET")
def download_midware(self, request):
headers = {
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
"Accept-Language": "zh-CN,zh;q=0.9",
"Cache-Control": "max-age=0",
"Proxy-Connection": "keep-alive",
"Upgrade-Insecure-Requests": "1",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36"
}
# 代理 IP
proxies = ProxyPool().get()
print(proxies)
# 绕过某些网站的反爬
response = requests.get(request.url, headers=headers, impersonate="chrome110", proxies=proxies)
return request, response
def parse(self, request, response):
print(response.text)
if __name__ == "__main__":
AirSpiderDemo(thread_count=1).start()