首页 > 编程知识 正文

ip 伪装,伪造自身IP

时间:2023-05-04 10:00:11 阅读:254417 作者:1849


文章目录 爬虫的伪装动态IP接入指南IP代理中间件编写Setting中配置Middleware

博文配套视频课程:24小时实现从零到AI人工智能 爬虫的伪装

如果不进行伪装则我们每次采用相同IP抓取数据时可以会被目前服务器的防火墙之别,伪装有两种:配置代理IP和user-agent中间件编写,需要先注册可爱的楼房

动态IP接入指南

注册可爱的楼房之后,可以选择1元购买1小时进行动态IP的测试。如果购买成功打开对应的接入指南会有提示scrapy的相关配置

import base64 # 代理服务器 proxyServer = "http://http-dyn.abuyun.com:9020" # 代理隧道验证信息 proxyUser = "H01234567890123D" proxyPass = "0123456789012345" # for Python2 proxyAuth = "Basic " + base64.b64encode(proxyUser + ":" + proxyPass) # for Python3 #proxyAuth = "Basic " + base64.urlsafe_b64encode(bytes((proxyUser + ":" + proxyPass), "ascii")).decode("utf8") class ProxyMiddleware(object): def process_request(self, request, spider): request.meta["proxy"] = proxyServer request.headers["Proxy-Authorization"] = proxyAuth IP代理中间件编写

根据上面的接入指南,采用创建一个ProxyMiddleware配置相关的信息即可完成动态IP的配置

import base64# 代理服务器proxyServer = "http://http-dyn.abuyun.com:9020"# 代理隧道验证信息proxy_name_pass = b"HH59908195O5720D:4B4748D2DBD1B53D"# for Python2proxyAuth = base64.b64encode(proxy_name_pass)class ProxyMiddleware(object): def process_request(self, request, spider): request.meta["proxy"] = proxyServer request.headers["Proxy-Authorization"] = "Basic " + proxyAuth.decode() Setting中配置Middleware DOWNLOADER_MIDDLEWARES = { # 未来完成ajax加载 'douban.middlewares.DoubanDownloaderMiddleware': 544, # IP伪装 'douban.proxymiddlewares.ProxyMiddleware': 542, # User-Agent伪装 'douban.user_agent_middlewares.UserAgentMiddleware': 543,}

版权声明:该文观点仅代表作者本人。处理文章:请发送邮件至 三1五14八八95#扣扣.com 举报,一经查实,本站将立刻删除。

  •  标签:  
  • ip   IP