今天为大家带来《爬行动物实战百例》41篇文章，爬行动物之路漫漫。

爬取目标

网站：规模有点大，承受不了。

效果展示

00-1010开发工具：pycharm开发环境：python3.7、Windows S11使用套件：requests

00-1010获取当前网页的跳转地址，当前网页是主页面数据，我们需要的数据是唯一的。获取网页信息，提取所有跳转地址，获取源代码中的A标签。当前网页的加载方式为静态数据，直接请求网页地址；

Url='https://www.xxxx.com/从源代码中提取所有跳转地址。

提取方法可以自己选择。边肖在这里使用常规方法提取数据，提取详细页面的地址和标题，用于保存图片的名称，并在获取地址后向地址发送请求以进入详细页面。详情页的数据也分很多页，每一页都有几张图片，需要拼接网址来构造新的地址信息。

对于范围(1，int(page_num[0]) 1):中的I，new_url=info_url.replace('。html '，f'_{i}。html ')jpg _ data=requests . get(new _ URL，headers=headers). content . decode(' gbk ')

请求后，提取所有图片地址并向图片地址发送请求，保存数据，就完成了！

工具准备

'用户代理' : ' Mozilla/5.0(Windows NT 10.0；Win64x64)applebwebkit/537.36(KHTML，像Gecko)Chrome/95 . 0 . 4638 . 69 Safari/537.36 ' URL='https://www.xxxx.com/guoneimeinv/list_5_{}。html。format(I)response=requests . get(URL，headers=headers)data _ list=re . find all('/a/lilia href='(。*?)' class='pic' target='_Blank' alt='(。*?)''，response.content.decode ('GBK ')为info _ URL，title in data _ list 3360 RES=requests . get(info _ URL，headers=headers)。content . decode(' GBK ')page _ num=re . findall('总计lia)page :/a/Lili '，RES)为I in range (1，int(page _ num[0])1)3360 new _ URL=info _ URL。替换('。html '，f' _ {I}。html') jpg _ data=requests。headers=headers). content . decode(' gbk ')jpg _ URL _ list=re . find all(' p align=' center ' img src='(。*?)'//pbr/'，jpg _ data)对于jpg _ URL _ list3360中的jgp _ URL结果=requests。get (jgp _ URL，headers=headers)。内容f=打开(' 1000画廊/' { title } '-str({ num })'。jpg '

python有什么用(爬虫python能做什么)

爬取目标

效果展示

工具准备