如何使用Python编写携程爬虫？

本文将从多个方面阐述如何使用Python编写携程爬虫，包括安装所需要的Python库、编写爬虫代码、解析网页内容、持久化存储爬取结果等。

一、安装需要的Python库

在编写携程爬虫之前，我们需要安装一些必要的Python库。

首先，我们需要安装 requests 库，它是一个Python第三方库，用于发送 HTTP 请求。

pip install requests

其次，我们需要安装 BeautifulSoup 库，这是一个用于解析HTML和XML文档的Python库。

pip install beautifulsoup4

二、编写爬虫代码

接下来，我们可以编写携程爬虫的代码了。

首先，我们需要导入 requests 库和 BeautifulSoup 库：

import requests
from bs4 import BeautifulSoup

然后，我们需要指定要请求的网页URL地址，并发送 HTTP 请求获取网页内容：

url = 'https://hotels.ctrip.com/hotel/shanghai2#ctm_ref=hod_hp_sb_lst'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
r = requests.get(url, headers=headers)
html = r.content

接下来，我们可以使用 BeautifulSoup 库解析获取到的网页内容：

soup = BeautifulSoup(html, 'html.parser')

在解析网页内容之后，我们可以使用 BeautifulSoup 库提供的各种方法来获取需要的信息。

三、解析网页内容

在使用 BeautifulSoup 库解析网页内容时，我们可以使用它提供的一些方法来定位需要的信息。

例如，我们可以使用 find_all() 方法获取网页中所有的酒店信息：

hotels = soup.find_all('div', class_='hotel_new_list')  # 获取所有的酒店信息

然后，我们可以遍历 hotels 列表来获取每个酒店的详细信息，例如酒店名称、评分、价格等。

for hotel in hotels:
    hotel_name = hotel.find('h2', class_='hotel_name').a.text  # 获取酒店名称
    hotel_score = hotel.find('span', {'itemprop': 'ratingValue'}).text  # 获取酒店评分
    hotel_price = hotel.find('span', class_='J_price_lowList').em.text  # 获取酒店价格
    print(f'{hotel_name} 的评分为 {hotel_score}，价格为 {hotel_price} 元。')

四、持久化存储爬取结果

最后，我们可以将爬取到的结果进行持久化存储，以便后续使用。

我们可以将爬取到的结果存储到文本文件中：

with open('hotels.txt', 'w', encoding='utf-8') as f:
    for hotel in hotels:
        hotel_name = hotel.find('h2', class_='hotel_name').a.text  # 获取酒店名称
        hotel_score = hotel.find('span', {'itemprop': 'ratingValue'}).text  # 获取酒店评分
        hotel_price = hotel.find('span', class_='J_price_lowList').em.text  # 获取酒店价格
        f.write(f'{hotel_name} 的评分为 {hotel_score}，价格为 {hotel_price} 元。n')

我们还可以将爬取到的结果存储到数据库中，以备后续分析。

完整的携程爬虫代码如下：

import requests
from bs4 import BeautifulSoup

url = 'https://hotels.ctrip.com/hotel/shanghai2#ctm_ref=hod_hp_sb_lst'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
r = requests.get(url, headers=headers)
html = r.content

soup = BeautifulSoup(html, 'html.parser')

hotels = soup.find_all('div', class_='hotel_new_list')
for hotel in hotels:
    hotel_name = hotel.find('h2', class_='hotel_name').a.text  # 获取酒店名称
    hotel_score = hotel.find('span', {'itemprop': 'ratingValue'}).text  # 获取酒店评分
    hotel_price = hotel.find('span', class_='J_price_lowList').em.text  # 获取酒店价格
    print(f'{hotel_name} 的评分为 {hotel_score}，价格为 {hotel_price} 元。')
    
    # 存储到文本文件中
    with open('hotels.txt', 'w', encoding='utf-8') as f:
        f.write(f'{hotel_name} 的评分为 {hotel_score}，价格为 {hotel_price} 元。n')