首页 > 编程知识 正文

python3中的scrapy爬虫,python scrapy处理报文

时间:2023-05-04 18:37:36 阅读:243619 作者:2281

1. 先行条件

将scrapy源代码,相对路径为:Lib/site-packages/scrapy/core/downloader/handlers/http11.py的文件中,

if isinstance(agent, self._TunnelingAgent):

headers.removeHeader(b'Proxy-Authorization')

给注释掉。

必须将此注释掉,否则proxy-authorization会被去除,动态转发失效。

2. 示例middleware

class EpDownloaderMiddleware(object):

def __init__(self):

self.orderno = "XXXXXXXXXXXXXXXXXXXXXXX"

self.secret = "XXXXXXXXXXXXXXXXXXXXXXX"

def process_request(self, request, spider):

request.meta['proxy'] = 'http://forward.xdaili.cn:80'

timestamp = str(int(time.time())) # timestamp

string = "orderno=" + self.orderno + "," + "secret=" + self.secret + "," + "timestamp=" + timestamp

md5_string = hashlib.md5(string.encode('utf-8')).hexdigest() # sign

sign = md5_string.upper()

auth = "sign=" + sign + "&" + "orderno=" + self.orderno + "&" + "timestamp=" + timestamp

request.headers["Proxy-Authorization"] = auth

版权声明:该文观点仅代表作者本人。处理文章:请发送邮件至 三1五14八八95#扣扣.com 举报,一经查实,本站将立刻删除。