python爬取网页图片教程,python爬虫根据日期下载图片

今天，看到“若无其事”等待的悟空在某个网站上闲逛，发现妹妹很多。但是，讨厌一个page上只显示一两张照片，家里的WiFi也不好用。于是，发挥“程序猴子”的本领，写一个小脚本，抓拍照片看。类似的功能已经有很多大师实现，但是本着学习和锻炼的精神，自己辛苦了好几次，提高了姿态。

让我先展示一下效果：

python代码：

#-* -编码： utf8- * -

import urllib2importreimportrequestsfromlxmlimportetreeimportosdefcheck _ save _ path (save _ path ) :try:

OS.mkdir(save_path ) except:pass

defget_image_name(image_link ) :

file _ name=OS.path.basename (image_link ) returnfile_namedefsave_image ) image _ link，save_path ) 3365365306;

file _ name=get _ image _ name (image _ link )

file _ path=save _ path '' file _ name print ('下载准备%s' %image_link ) try:

file_Handler=open(file_path，' wb ' ) ) ) ) ) ) ) ) )。

image _ handler=urllib2. urlopen (URL=image _ link，timeout=5).read () ) ) ) ) ) )。

file _ handler.write (image _ handler ) )。

file_handler.closed () exceptException，ex:print ) ex.message ) defget _ image _ link _ from _ web _ page

image _ link _ list=[ ] print (web _ page _ link ) try:

hml _ content=urllib2. urlopen (URL=web _ page _ link，timeout=5).read () ) ) ) ) ) ) )。

html_tree=etree.html (html _ content ) print (str ) html _ tree ) )

link _ list=html _ tree.XPath (/p/img/@ src ) ) forlinkinlink _ list 3360 # print (link ) ) ) )。

ifstr(link ).find(uploadfile ) ) :

image _ link _ list.append (http://www.xgyw.cc/' link ) exceptException，ex:pass

return image _ link _ listdefget _ page _ link _ list _ from _ index _ page (base _ page _ link ) :try:

html _ content=urllib2. urlopen (URL=base _ page _ link，timeout=5).read () ) ) ) ) ) )。

html_tree=etree.html (html _ content ) print (str ) html _ tree ) )

link _ tmp _ list=html _ tree.XPath ('/div [ @ class=' page ' ]/a/@ href ' )

page _ link _ list=[ ] for link _ tmpinlink _ tmp _ list 3360

page _ link _ list.append (http://www.xgyw.cc/) link _ tmp return page _ link _ listexceptexception，ex 3360 proper

html _ content=urllib2. urlopen (URL=base _ page _ link，timeout=5).read () ) ) ) ) ) )。

html_tree=etree.html (html _ content ) print (str ) html _ tree ) )

page _ title _ list=html _ tree.XPath (/TD/div [ @ class=' title ' ) ]

page _ title _ tmp=page _ title _ list [0].text print (page _ title _ tmp ) return page _ title _ tmpexceptexceptept

ef get _ image _ from _ web (base _ page _ link，save_path ) :

check_save_path(save_path )。

page _ link _ list=get _ page _ link _ list _ from _ index _ page (base _ page _ link ) forpage_linkinpage_

image _ link _ list=get _ image _ link _ from _ web _ page (page _ link ) for image _ linkin image _ link _ list 333333306;

save_image(image_link，save_path ) ) ) ) )。

base _ page _ link=' http://www.xgyw.cc/tui girl/tui girl 1346.html ' page _ title=get _ page _ title _ from _

ave _ path=' n :\ pic\ " page _ title else 3360

save_path=' n :\ pic\ other\ " get _ image _ from _ web (base _ page _ link，save _ path ) )

视图代码

代码想法：

使用urllib2.urlopen(URL ).open检索页面数据，etree.HTML ) )将页面转换为xml格式。可以使用xmlpath方法获取特定节点的值，最终遍历所有页面获取要下载的图像，并将图像存储在本地。

=====================================

安装python软件包：

许多python软件包没有windows安装包，或没有X64版本的安装包。对初学者来说，很难马上得到。可以使用pip或easy_install安装要使用的安装软件包。相关安装方法： https://pypi.python.org/pypi/setuptoool

本人采用easy_install方式，我的计算机安装了python2.7，安装路径为C:Python27python.exe，停机ez_setup.py文件

c :python 27python.exe ' c :ez _ setup.py '

可以安装easy_install。安装完成后，C:Python27Scripts下将显示easy_install-2.7.exe。如果您想在本地安装requests软件包，请尝试运行以下命令：

' c :python 27scriptseasy _ install-2.7.exe ' requests

=====================================

依然像妹妹一样强压，推女性的第68期，想要图的自己是百度