首页 > 编程知识 正文

简单的爬虫实验片,科学小实验电动爬虫

时间:2023-05-03 18:43:54 阅读:273313 作者:680

 

 

 

 

 

 

 

 

 

 

 代码展示:

## 读取一页的数据def loaddata(url): from bs4 import BeautifulSoup import requests headers = { 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_3) AppleWebKit/537.36 (KHTML, like Gecko) ' 'Chrome/72.0.3626.121 Safari/537.36' } f = requests.get(url, headers=headers) # Get该网页从而获取该html内容 soup = BeautifulSoup(f.content, "lxml") # 用lxml解析器解析该网页的内容, f.text返回的html # print(f.content.decode()) #尝试打印出网页内容,看是否获取成功 ranktable = soup.find_all('table', class_="rank-table")[0] # 获取排行榜表格 trlist = ranktable.find_all('tr') # 获取表格中所有tr标签 trlist.pop(0) # 去掉第一个元素 persionlist = [] for tr in trlist: persion = {} persion['num'] = tr.find_all('td')[0].string # 编号 persion['name'] = tr.find_all('td')[1].p.string # 名称 persion['money'] = tr.find_all('td')[2].string # 财产 persion['company'] = tr.find_all('td')[3].string # 企业 persion['country'] = tr.find_all('td')[4].a.string # 国家 persionlist.append(persion) print("页面" + url + "爬取成功") return persionlist## 读取所有福布斯排行榜数据def loadalldata(): alldata = [] for i in range(1, 16, 1): url = "https://www.phb123.com/renwu/fuhao/shishi_" + str(i) + ".html" data = loaddata(url) alldata = alldata + data return alldata## 将爬取的数据保存到文件def savedata(path, persionlist): import xlwt workbook = xlwt.Workbook() worksheet = workbook.add_sheet('test') worksheet.write(0, 0, '排名') worksheet.write(0, 1, '姓名') worksheet.write(0, 2, '财富') worksheet.write(0, 3, '企业') worksheet.write(0, 4, '国家') for i in range(1, len(persionlist) + 1, 1): worksheet.write(i, 0, persionlist[i - 1]['num']) worksheet.write(i, 1, persionlist[i - 1]['name']) worksheet.write(i, 2, persionlist[i - 1]['money']) worksheet.write(i, 3, persionlist[i - 1]['company']) worksheet.write(i, 4, persionlist[i - 1]['country']) workbook.save(path) print("数据保存成功:" + path)if __name__ == '__main__': ## 爬取数据 data = loadalldata() ## 保存数据 savedata("rank.xls", data) # py文件同级目录创建rank.xls文件

版权声明:该文观点仅代表作者本人。处理文章:请发送邮件至 三1五14八八95#扣扣.com 举报,一经查实,本站将立刻删除。