本文实例讲述了Python爬虫实现获取动态gif格式搞笑图片的方法。分享给大家供大家参考,具体如下:
有时候看到一些喜欢的动图,如果一个个取保存挺麻烦,有的网站还不支持右键保存,因此使用python来获取动态图,就看看就很有意思了
本次爬取的网站是 居然搞笑网 http://www.zbjuran.com/dongtai/list_4_1.html
思路:
获取当前页面内容
查找页面中动图所代表的url地址
保存这个地址内容到本地
如果想爬取多页,就可以加上一个循环条件
代码:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
|
#!/usr/bin/python #coding:utf-8 import urllib2,time,uuid,urllib,os,sys,re from bs4 import BeautifulSoup reload (sys) sys.setdefaultencoding( 'utf-8' ) #获取页面内容 def getHtml(url): try : print url html = urllib2.urlopen(url).read() #.decode('utf-8')#解码为utf-8 except : return return html #获取动图所代表的url列表 def getImagUrl(html): if not html: print 'nothing can be found' return ImagUrlList = [] soup = BeautifulSoup(html, 'lxml' ) #获取item列表 items = soup.find( "div" ,{ "class" : "main" }).find_all( 'div' ,{ 'class' : 'item' }) for item in items: target = {} #通过if语句,过滤广告项 if item.find( 'div' ,{ "class" : "text" }): #获取url imgurl = item.find( 'div' ,{ "class" : "text" }).find( 'img' ).get( 'src' ) target[ 'url' ] = imgurl #获取名字 target[ 'name' ] = item.find( 'h3' ).text ImagUrlList.append(target) return ImagUrlList #下载图片到本地 def download(author,imgurl,typename,pageNo): #定义文件夹的名字 x = time.localtime(time.time()) foldername = str (x.__getattribute__( "tm_year" )) + "-" + str (x.__getattribute__( "tm_mon" )) + "-" + str (x.__getattribute__( "tm_mday" )) download_img = None picpath = 'Jimy/%s/%s/%s' % (foldername,typename, str (pageNo)) filename = author + str (uuid.uuid1()) pic_type = imgurl[ - 3 :] if not os.path.exists(picpath): os.makedirs(picpath) target = picpath + "/%s.%s" % (filename,pic_type) print "动图存贮位置:" + target download_img = urllib.urlretrieve(imgurl, target) #将图片下载到指定路径中 print "图片出处为:" + imgurl return download_img #退出函数 def myquit(): print "Bye Bye!" exit( 0 ) def start(pageNo): targeturl = "http://www.zbjuran.com/dongtai/list_4_%s.html" % str (pageNo) html = getHtml(targeturl) urllist = getImagUrl(html) for imgurl in urllist: download(imgurl[ 'name' ],imgurl[ 'url' ], '搞笑动图' ,pageNo) if __name__ = = '__main__' : print ''' ***************************************** ** Welcome to Spider of GIF ** ** Created on 2017-3-16 ** ** @author: Jimy ** *****************************************''' pageNo = raw_input (" Input the page number you want to scratch ( 1 - 50 ),please input 'quit' if you want to quit\n\ 请输入要爬取的页面,范围为( 1 - 100 ),如果退出,请输入Q>\n>") while not pageNo.isdigit() or int (pageNo) > 50 or int (pageNo) < 1 : if pageNo = = 'Q' : myquit() print "Param is invalid , please try again." pageNo = raw_input ( "Input the page number you want to scratch >" ) print pageNo start(pageNo) #第一次爬取结束 pageNo = raw_input (" Input the page number you want to scratch ( 1 - 50 ),please input 'quit' if you want to quit\n\ 请输入总共需要爬取的页面,范围为( 1 - 5000 ),如果退出,请输入Q>\n>") while not pageNo.isdigit() or int (pageNo) > 5000 or int (pageNo) < 1 : if pageNo = = 'Q' : myquit() print "Param is invalid , please try again." pageNo = raw_input ( "Input the page number you want to scratch >" ) #循环遍历,爬取多页 for num in xrange ( int (pageNo)): start( str (num + 1 )) |
结果如下:
*****************************************
** Welcome to Spider of GIF **
** Created on 2017-3-16 **
** @author: Jimy **
*****************************************
Input the page number you want to scratch (1-50),please input 'quit' if you want to quit
请输入要爬取的页面,范围为(1-100),如果退出,请输入Q>
>1
1
http://www.zbjuran.com/dongtai/list_4_1.html
动图存贮位置:Jimy/2017-3-16/搞笑动图/1/真是艰难的选择。3f0fe8f6-09f8-11e7-9161-f8bc12753d1e.gif
图片出处为:http://www.zbjuran.com/uploads/allimg/170206/10-1F206135ZHJ.gif
动图存贮位置:Jimy/2017-3-16/搞笑动图/1/这么贱会被打死吧……3fa9da88-09f8-11e7-9161-f8bc12753d1e.gif
图片出处为:http://www.zbjuran.com/uploads/allimg/170206/10-1F206135H35U.gif
动图存贮位置:Jimy/2017-3-16/搞笑动图/1/一看就是印度……4064e60c-09f8-11e7-9161-f8bc12753d1e.gif
图片出处为:http://www.zbjuran.com/uploads/allimg/170206/10-1F20613543c50.gif
动图存贮位置:Jimy/2017-3-16/搞笑动图/1/新垣结衣的正经工作脸414b4f52-09f8-11e7-9161-f8bc12753d1e.gif
图片出处为:http://www.zbjuran.com/uploads/allimg/170206/10-1F206135250553.gif
动图存贮位置:Jimy/2017-3-16/搞笑动图/1/妹子这是在摇什么的421afa86-09f8-11e7-9161-f8bc12753d1e.gif
图片出处为:http://www.zbjuran.com/uploads/allimg/170206/10-1F20613493N03.gif
Input the page number you want to scratch (1-50),please input 'quit' if you want to quit
请输入总共需要爬取的页面,范围为(1-5000),如果退出,请输入Q>
>Q
Bye Bye!
最终就能够获得动态图了
希望本文所述对大家Python程序设计有所帮助。
原文链接:https://blog.csdn.net/qiqiyingse/article/details/62418857