首先我们来爬取 http://html-color-codes.info/color-names/ 的一些数据。
按 f12 或 ctrl+u 审查元素,结果如下:
结构很清晰简单,我们就是要爬 tr 标签里面的 style 和 tr 下几个并列的 td 标签,下面是爬取的代码:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
|
#!/usr/bin/env python # coding=utf-8 import requests from bs4 import beautifulsoup import mysqldb print ( '连接到mysql服务器...' ) db = mysqldb.connect( "localhost" , "hp" , "hp12345." , "testdb" ) print ( '连接上了!' ) cursor = db.cursor() cursor.execute( "drop table if exists color" ) sql = """create table color ( color char(20) not null, value char(10), style char(50) )""" cursor.execute(sql) hdrs = { 'user-agent' : 'mozilla/5.0 (x11; fedora; linux x86_64) applewebkit/537.36 (khtml, like gecko)' } url = "http://html-color-codes.info/color-names/" r = requests.get(url, headers = hdrs) soup = beautifulsoup(r.content.decode( 'gbk' , 'ignore' ), 'lxml' ) trs = soup.find_all( 'tr' ) # 获取全部tr标签成为一个列表 for tr in trs: # 遍历列表里所有的tr标签单项 style = tr.get( 'style' ) # 获取每个tr标签里的属性style tds = tr.find_all( 'td' ) # 将每个tr标签下的td标签获取为列表 td = [x for x in tds] # 获取的列表 name = td[ 1 ].text.strip() # 直接从列表里取值 hex = td[ 2 ].text.strip() # print u'颜色: ' + name + u'颜色值: '+ hex + u'背景色样式: ' + style # print 'color: ' + name + '\tvalue: '+ hex + '\tstyle: ' + style insert_color = ( "insert into color(color,value,style)" "values(%s,%s,%s)" ) data_color = (name, hex , style) cursor.execute(insert_color, data_color) db.commit() # print '******完成此条插入!' print '爬取数据并插入mysql数据库完成...' |
运行结果:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
|
$ mysql - u hp - p enter password: welcome to the mysql monitor. commands end with ; or \g. your mysql connection id is 28 server version: 5.7 . 17 mysql community server (gpl) copyright (c) 2000 , 2011 , oracle and / or its affiliates. all rights reserved. oracle is a registered trademark of oracle corporation and / or its affiliates. other names may be trademarks of their respective owners. type 'help;' or '\h' for help . type '\c' to clear the current input statement. mysql> use testdb reading table information for completion of table and column names you can turn off this feature to get a quicker startup with - a database changed mysql> select * from color; + - - - - - - - - - - - - - - - - - - - - - - + - - - - - - - - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + | color | value | style | + - - - - - - - - - - - - - - - - - - - - - - + - - - - - - - - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + | indianred | cd5c5c | background - color:indianred; | | lightcoral | f08080 | background - color:lightcoral; | | salmon | fa8072 | background - color:salmon; | | darksalmon | e9967a | background - color:darksalmon; | | lightsalmon | ffa07a | background - color:lightsalmon; | | crimson | dc143c | background - color:crimson; | | red | ff0000 | background - color:red; | | firebrick | b22222 | background - color:firebrick; | | darkred | 8b0000 | background - color:darkred; | | pink | ffc0cb | background - color:pink; | | lightpink | ffb6c1 | background - color:lightpink; | | hotpink | ff69b4 | background - color:hotpink; | | deeppink | ff1493 | background - color:deeppink; | ... | antiquewhite | faebd7 | background - color:antiquewhite; | | linen | faf0e6 | background - color:linen; | | lavenderblush | fff0f5 | background - color:lavenderblush; | | mistyrose | ffe4e1 | background - color:mistyrose; | | gainsboro | dcdcdc | background - color:gainsboro; | | lightgrey | d3d3d3 | background - color:lightgrey; | | silver | c0c0c0 | background - color:silver; | | darkgray | a9a9a9 | background - color:darkgray; | | gray | 808080 | background - color:gray; | | dimgray | 696969 | background - color:dimgray; | | lightslategray | 778899 | background - color:lightslategray; | | slategray | 708090 | background - color:slategray; | | darkslategray | 2f4f4f | background - color:darkslategray; | | black | 000000 | background - color:black; | + - - - - - - - - - - - - - - - - - - - - - - + - - - - - - - - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + 143 rows in set ( 0.00 sec) |
以上这篇python爬取数据并写入mysql数据库的实例就是小编分享给大家的全部内容了,希望能给大家一个参考,也希望大家多多支持服务器之家。
原文链接:https://blog.csdn.net/Oscer2016/article/details/70257956