获取单独一个table,代码如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
|
#!/usr/bin/env python3 # _*_ coding=utf-8 _*_ import csv from urllib.request import urlopen from bs4 import BeautifulSoup from urllib.request import HTTPError try : html = urlopen( "http://en.wikipedia.org/wiki/Comparison_of_text_editors" ) except HTTPError as e: print ( "not found" ) bsObj = BeautifulSoup(html, "html.parser" ) table = bsObj.findAll( "table" ,{ "class" : "wikitable" })[ 0 ] if table is None : print ( "no table" ); exit( 1 ) rows = table.findAll( "tr" ) csvFile = open ( "editors.csv" , 'wt' ,newline = ' ',encoding=' utf - 8 ') writer = csv.writer(csvFile) try : for row in rows: csvRow = [] for cell in row.findAll([ 'td' , 'th' ]): csvRow.append(cell.get_text()) writer.writerow(csvRow) finally : csvFile.close() |
获取所有table,代码如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
|
#!/usr/bin/env python3 # _*_ coding=utf-8 _*_ import csv from urllib.request import urlopen from bs4 import BeautifulSoup from urllib.request import HTTPError try : html = urlopen( "http://en.wikipedia.org/wiki/Comparison_of_text_editors" ) except HTTPError as e: print ( "not found" ) bsObj = BeautifulSoup(html, "html.parser" ) tables = bsObj.findAll( "table" ,{ "class" : "wikitable" }) if tables is None : print ( "no table" ); exit( 1 ) i = 1 for table in tables: fileName = "table%s.csv" % i rows = table.findAll( "tr" ) csvFile = open (fileName, 'wt' ,newline = ' ',encoding=' utf - 8 ') writer = csv.writer(csvFile) try : for row in rows: csvRow = [] for cell in row.findAll([ 'td' , 'th' ]): csvRow.append(cell.get_text()) writer.writerow(csvRow) finally : csvFile.close() i + = 1 |
以上这篇python 获取页面表格数据存放到csv中的方法就是小编分享给大家的全部内容了,希望能给大家一个参考,也希望大家多多支持服务器之家。
原文链接:https://blog.csdn.net/u011085172/article/details/73810708