一、什么是xml?有何特征?
xml即可扩展标记语言,它可以用来标记数据、定义数据类型,是一种允许用户对自己的标记语言进行定义的源语言。
例子:del.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
|
<? xml version = "1.0" encoding = "utf-8" ?> < catalog > < maxid >4</ maxid > < login username = "pytest" passwd = '123456' > < caption >Python</ caption > < item id = "4" > < caption >test</ caption > </ item > </ login > < item id = "2" > < caption >Zope</ caption > </ item > </ catalog > |
从结构上,很像HTML超文本标记语言。但他们被设计的目的是不同的,超文本标记语言被设计用来显示数据,其焦点是数据的外观。它被设计用来传输和存储数据,其焦点是数据的内容。
那么它有如下特征:
•它是有标签对组成, <aa></aa>
•标签可以有属性: <aa id='123'></aa>
•标签对可以嵌入数据: <aa>abc</aa>
•标签可以嵌入子标签(具有层级关系)
二、获得标签属性
1
2
3
4
5
6
7
8
9
|
#coding: utf-8 import xml.dom.minidom dom = xml.dom.minidom.parse( "del.xml" ) #打开xml文档 root = dom.documentElement #得到xml文档对象 print "nodeName:" , root.nodeName #每一个结点都有它的nodeName,nodeValue,nodeType属性 print "nodeValue:" , root.nodeValue #nodeValue是结点的值,只对文本结点有效 print "nodeType:" , root.nodeType print "ELEMENT_NODE:" , root.ELEMENT_NODE |
nodeType是结点的类型。catalog是ELEMENT_NODE类型
现在有以下几种:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
|
'ATTRIBUTE_NODE' 'CDATA_SECTION_NODE' 'COMMENT_NODE' 'DOCUMENT_FRAGMENT_NODE' 'DOCUMENT_NODE' 'DOCUMENT_TYPE_NODE' 'ELEMENT_NODE' 'ENTITY_NODE' 'ENTITY_REFERENCE_NODE' 'NOTATION_NODE' 'PROCESSING_INSTRUCTION_NODE' 'TEXT_NODE' |
运行结果
1
2
3
4
5
6
7
|
nodeName: catalog nodeValue: None nodeType: 1 ELEMENT_NODE: 1 |
三、获得子标签
1
2
3
4
5
6
7
8
9
10
11
|
#coding: utf-8 import xml.dom.minidom dom = xml.dom.minidom.parse( "del.xml" ) root = dom.documentElement bb = root.getElementsByTagName( 'maxid' ) print type (bb) print bb b = bb[ 0 ] print b.nodeName print b.nodeValue |
运行结果
1
2
3
4
5
6
7
|
< class 'xml.dom.minicompat.NodeList' > [<DOM Element: maxid at 0x2707a48 >] maxid None |
四、获得标签属性值
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
|
#coding: utf-8 import xml.dom.minidom dom = xml.dom.minidom.parse( "del.xml" ) root = dom.documentElement itemlist = root.getElementsByTagName( 'login' ) item = itemlist[ 0 ] print item.getAttribute( "username" ) print item.getAttribute( "passwd" ) itemlist = root.getElementsByTagName( "item" ) item = itemlist[ 0 ] #通过在itemlist中的位置区分 print item.getAttribute( "id" ) item2 = itemlist[ 1 ] #通过在itemlist中的位置区分 print item2.getAttribute( "id" ) |
运行结果
1
2
3
4
5
6
7
|
pytest 123456 4 2 |
五、获得标签对之间的数据
1
2
3
4
5
6
7
8
9
10
11
12
|
#coding: utf-8 import xml.dom.minidom dom = xml.dom.minidom.parse( "del.xml" ) root = dom.documentElement itemlist = root.getElementsByTagName( 'caption' ) item = itemlist[ 0 ] print item.firstChild.data item2 = itemlist[ 1 ] print item2.firstChild.data |
运行结果
1
2
3
|
Python test |
六、例子
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
|
<?xml version = "1.0" encoding = "UTF-8" ?> <users> <user id = "1000001" > <username>Admin< / username> <email>admin@live.cn< / email> <age> 23 < / age> <sex>boy< / sex> < / user> <user id = "1000002" > <username>Admin2< / username> <email>admin2@live.cn< / email> <age> 22 < / age> <sex>boy< / sex> < / user> <user id = "1000003" > <username>Admin3< / username> <email>admin3@live.cn< / email> <age> 27 < / age> <sex>boy< / sex> < / user> <user id = "1000004" > <username>Admin4< / username> <email>admin4@live.cn< / email> <age> 25 < / age> <sex>girl< / sex> < / user> <user id = "1000005" > <username>Admin5< / username> <email>admin5@live.cn< / email> <age> 20 < / age> <sex>boy< / sex> < / user> <user id = "1000006" > <username>Admin6< / username> <email>admin6@live.cn< / email> <age> 23 < / age> <sex>girl< / sex> < / user> < / users> |
把name、email、age、sex输出
参考代码
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
|
# -*- coding:utf-8 -*- from xml.dom import minidom def get_attrvalue(node, attrname): return node.getAttribute(attrname) if node else '' def get_nodevalue(node, index = 0 ): return node.childNodes[index].nodeValue if node else '' def get_xmlnode(node, name): return node.getElementsByTagName(name) if node else [] def get_xml_data(filename = 'user.xml' ): doc = minidom.parse(filename) root = doc.documentElement user_nodes = get_xmlnode(root, 'user' ) print "user_nodes:" , user_nodes user_list = [] for node in user_nodes: user_id = get_attrvalue(node, 'id' ) node_name = get_xmlnode(node, 'username' ) node_email = get_xmlnode(node, 'email' ) node_age = get_xmlnode(node, 'age' ) node_sex = get_xmlnode(node, 'sex' ) user_name = get_nodevalue(node_name[ 0 ]) user_email = get_nodevalue(node_email[ 0 ]) user_age = int (get_nodevalue(node_age[ 0 ])) user_sex = get_nodevalue(node_sex[ 0 ]) user = {} user[ 'id' ] , user[ 'username' ] , user[ 'email' ] , user[ 'age' ] , user[ 'sex' ] = ( int (user_id), user_name , user_email , user_age , user_sex ) user_list.append(user) return user_list def test_load_xml(): user_list = get_xml_data() for user in user_list : print '-----------------------------------------------------' if user: user_str = 'No.:\t%d\nname:\t%s\nsex:\t%s\nage:\t%s\nEmail:\t%s' % ( int (user[ 'id' ]) , user[ 'username' ], user[ 'sex' ] , user[ 'age' ] , user[ 'email' ]) print user_str if __name__ = = "__main__" : test_load_xml() |
结果
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
|
C:\Users\wzh94434\Desktop\xml>python user.py user_nodes: [<DOM Element: user at 0x2758c48 >, <DOM Element: user at 0x2756288 >, <DOM Element: user at 0x2756888 >, <DOM Element: user at 0x2756e88 >, <DOM Elemen t: user at 0x275e4c8 >, <DOM Element: user at 0x275eac8 >] - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - No.: 1000001 name: Admin sex: boy age: 23 Email: admin@live.cn - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - No.: 1000002 name: Admin2 sex: boy age: 22 Email: admin2@live.cn - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - No.: 1000003 name: Admin3 sex: boy age: 27 Email: admin3@live.cn - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - No.: 1000004 name: Admin4 sex: gril age: 25 Email: admin4@live.cn - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - No.: 1000005 name: Admin5 sex: boy age: 20 Email: admin5@live.cn - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - No.: 1000006 name: Admin6 sex: gril age: 23 Email: admin6@live.cn |
七、总结
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
|
minidom.parse(filename) 加载读取XML文件 doc.documentElement 获取XML文档对象 node.getAttribute(AttributeName) 获取XML节点属性值 node.getElementsByTagName(TagName) 获取XML节点对象集合 node.childNodes #返回子节点列表。 node.childNodes[index].nodeValue 获取XML节点值 node.firstChild #访问第一个节点。等价于pagexml.childNodes[0] doc = minidom.parse(filename) doc.toxml('UTF-8') 返回Node节点的xml表示的文本 Node.attributes["id"] a.name #就是上面的 "id" a.value #属性的值 访问元素属性 |
好了,以上就是这篇文章的全部内容了,希望本文的内容对大家的学习或者工作能带来一定的帮助,如果有疑问大家可以留言交流,谢谢大家对服务器之家的支持。
原文链接:http://www.cnblogs.com/kaituorensheng/p/4493306.html