使用Python的CSS选择器实例

本文将从多个方面详细阐述使用Python的CSS选择器的实例。

一、基本选择器

1、标签选择器

标签选择器是最常见的选择器之一，可以通过标签名称选中对应的元素。

from bs4 import BeautifulSoup

html_doc = """
<html>
   <head>
      <title>测试页面</title>
   </head>
   <body>
      <h1>欢迎</h1>
      <p>这是一个测试页面</p>
   </body>
</html>
"""

soup = BeautifulSoup(html_doc, 'html.parser')
tag = soup.select('h1')
print(tag[0].text)

2、类选择器

类选择器可以通过元素的class属性选中对应的元素。

html_doc = """
<html>
   <head>
      <title>测试页面</title>
   </head>
   <body>
      <h1 class="header">欢迎</h1>
      <p class="content">这是一个测试页面</p>
   </body>
</html>
"""

soup = BeautifulSoup(html_doc, 'html.parser')
tag = soup.select('.header')
print(tag[0].text)

3、ID选择器

ID选择器可以通过元素的id属性选中对应的元素。

html_doc = """
<html>
   <head>
      <title>测试页面</title>
   </head>
   <body>
      <h1 id="title">欢迎</h1>
      <p id="content">这是一个测试页面</p>
   </body>
</html>
"""

soup = BeautifulSoup(html_doc, 'html.parser')
tag = soup.select('#title')
print(tag[0].text)

二、属性选择器

1、属性选择器可以通过元素的属性选中对应的元素。

html_doc = """
<html>
   <head>
      <title>测试页面</title>
   </head>
   <body>
      <h1 class="header" id="title1">欢迎</h1>
      <h1 class="header" id="title2">欢迎</h1>
      <p class="content" id="content1">这是一个测试页面</p>
      <p class="content" id="content2">这是一个测试页面</p>
   </body>
</html>
"""

soup = BeautifulSoup(html_doc, 'html.parser')
tag = soup.select('h1[class="header"]')
print(tag[0].text)

2、属性选择器还可以通过属性的部分值选中对应的元素。

tag = soup.select('p[id^="content"]')
print(tag[0].text)

三、组合选择器

组合选择器可以通过多个选择器的组合选中对应的元素。

html_doc = """
<html>
   <head>
      <title>测试页面</title>
   </head>
   <body>
      <div class="container">
         <h1 class="header" id="title1">欢迎</h1>
         <p class="content" id="content1">这是一个测试页面</p>
      </div>
      <div class="container">
         <h1 class="header" id="title2">欢迎</h1>
         <p class="content" id="content2">这是一个测试页面</p>
      </div>
   </body>
</html>
"""

soup = BeautifulSoup(html_doc, 'html.parser')
tag = soup.select('.container h1')
print(tag[0].text)

通过组合选择器，可以灵活地选择需要的元素。

四、伪类选择器

伪类选择器可以通过元素的状态或位置选中对应的元素。

html_doc = """
<html>
   <head>
      <title>测试页面</title>
   </head>
   <body>
      <ul>
         <li>列表项1</li>
         <li>列表项2</li>
         <li>列表项3</li>
         <li>列表项4</li>
         <li>列表项5</li>
      </ul>
   </body>
</html>
"""

soup = BeautifulSoup(html_doc, 'html.parser')
tag = soup.select('li:nth-of-type(odd)')
for item in tag:
   print(item.text)

通过伪类选择器，可以实现对特定状态或位置的元素的选中。