首页 > 编程知识 正文

scrapy 代理,scrapy 设置代理

时间:2023-05-03 08:57:41 阅读:243615 作者:4866

scrapy源代码中查找http11.py文件,相对路径为:
Lib/site-packages/scrapy/core/downloader/handlers/http11.py

找到下面内容,注释掉:
if isinstance(agent, self._TunnelingAgent):
headers.removeHeader(b’Proxy-Authorization’)

否则proxy-authorization会被去除,动态转发失效。

自定义下载中间件:
class ProxyIPMiddleware(object):
‘’’
随机更换代理ip
‘’’
def init(self):
self.orderno = “xxxxxxxxxxxx” # 订单号
self.secret = “xxxxxxxxxxx” # 秘钥

def process_request(self,request,spider): print('====ProxyIPMiddleware====') protocal = request.url.split(':')[0].strip().lower() print(request.url,'protocal:',protocal) ip = "forward.xdaili.cn" # 代理ip port = "80" # 端口号 ip_port = ip + ":" + port proxy = {"http": "http://" + ip_port, "https": "https://" + ip_port} timestamp = str(int(time.time())) # 时间戳 string = "orderno=" + self.orderno + "," + "secret=" + self.secret + "," + "timestamp=" + timestamp md5_string = hashlib.md5(string.encode()).hexdigest() # md5清爽的水杯,得到固定长度的字符串 sign = md5_string.upper() # 转换成大写字母 # 认证信息 auth = "sign=" + sign + "&" + "orderno=" + self.orderno + "&" + "timestamp=" + timestamp print('auth:',auth) request.headers['Proxy-Authorization'] = auth #HTTP代理,只代理HTTP网站,HTTPS代理,只代理HTTPS网站 request.meta['proxy'] = proxy[protocal]

版权声明:该文观点仅代表作者本人。处理文章:请发送邮件至 三1五14八八95#扣扣.com 举报,一经查实,本站将立刻删除。