python爬虫教程,python和java

学习目标(python学习二十二)数据收集日常练习学习内容) 1、提笔逸闻阁首页小说简介

2、利用start—with获取有趣的百科全书相关信息

3、获取优图网图片，//表示前面有东西模糊匹配img标签，获取data-original图片地址

4、抓取安居客非图片内容

1、抓笔逸闻阁首页小说简介source=requests.get (http://www.xbiquge.la )、headers=headers (.text base=etree.html ) ) Li ' books=I.XPath (span [2]/a/text () ) (chapter=I.XPath ) span[3]/a/text ) ) ) author=I.XPath ) )第160章完全无用之地) ) ([小老叔] ) ([其他小说] ) )荒野黑客) ) )第四十三章楞勃打我() )云外鸡叫一声) ([其他小说] ) 第二百八十九章最弱的你丈夫和一个女人在一起呢' ] [ '高大的电话' ] . 2、利用start—with获取有趣的百科全书相关信息//[ @ id=" Qiushi _ tag _ 12399

//[ @ id=“Qiushi _ tag _ 123884600”]/a [1]/div/span

//[ @ id=“Qiushi _ tag _ 124000602”]

//[ @ id=" Qiushi _ tag _ 124000602 "/div [1]/a [2]/H2/* [ @ id=" Qiushi _ tag _ 124002094 " ]/a ]

importrequestsfromlxmlimportetreesource=requests.get (' https://www.qiushibaike.com/text/').text base=etree.HTT aike (Qiushi _ tag _ () ) (forIinbase:text=I.XPath ) ) a/div/span(1)/text . replace('n '，' ) (打印) author， I ) 3、获取优图网图像，通过//将前面的模糊匹配到img标签中获取到data-oor) : source=requests.get (http://www.up PSD.com/searce ) headers=headers (.text base=etree.html ) source ).XPath (“/img”@ class=“lazy”)/@data-original ) source source=requests.get (' https://Tianjin.anjuke.com/sale/from=navigation '， headers=headers (.text base=etree.html ) ) source ).XPath ) (/* [ @ id=' _ layout ' ]/div/section/section div )1)/H3/text ) ) ) print ) Titel ) txt=I.Xpath text () ) ) print ) txt ) neirong=I.XPath ) ) /div )