之前在转换数据集格式的时候需要将json转换到xml文件,用lxml包进行操作非常方便。
1. 写xml文件
a) 用etree和objectify
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
|
from lxml import etree, objectify E = objectify.ElementMaker(annotate = False ) anno_tree = E.annotation( E.folder( 'VOC2014_instance' ), E.filename( "test.jpg" ), E.source( E.database( 'COCO' ), E.annotation( 'COCO' ), E.image( 'COCO' ), E.url( "http://test.jpg" ) ), E.size( E.width( 800 ), E.height( 600 ), E.depth( 3 ) ), E.segmented( 0 ), ) etree.ElementTree(anno_tree).write( "text.xml" , pretty_print = True ) |
输出的test.xml文件内容如下:
```
如果需要在anno_tree的基础上加其他标签的话用append即可:
1
2
3
4
5
6
7
8
9
10
11
12
|
E2 = objectify.ElementMaker(annotate = False ) anno_tree2 = E2. object ( E.name( "person" ), E.bndbox( E.xmin( 100 ), E.ymin( 200 ), E.xmax( 300 ), E.ymax( 400 ) ), E.difficult( 0 ) ) anno_tree.append(anno_tree2) |
上面的输出就变成了:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
|
<annotation> <folder>VOC2014_instance / person< / folder> <filename>test.jpg< / filename> <source> <database>COCO< / database> <annotation>COCO< / annotation> <image>COCO< / image> <url>http: / / test.jpg< / url> < / source> <size> <width> 800 < / width> <height> 600 < / height> <depth> 3 < / depth> < / size> <segmented> 0 < / segmented> < object > <name>person< / name> <bndbox> <xmin> 100 < / xmin> <ymin> 200 < / ymin> <xmax> 300 < / xmax> <ymax> 400 < / ymax> < / bndbox> <difficult> 0 < / difficult> < / object > < / annotation> |
b) 用etree和SubElement
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
|
annotation = etree.Element( "annotation" ) etree.SubElement(annotation, "folder" ).text = "VOC2014_instance" etree.SubElement(annotation, "filename" ).text = "test.jpg" source = etree.SubElement(annotation, "source" ) etree.SubElement(source, "database" ).text = "COCO" etree.SubElement(source, "annotation" ).text = "COCO" etree.SubElement(source, "image" ).text = "COCO" etree.SubElement(source, "url" ).text = "http://test.jpg" size = etree.SubElement(annotation, "size" ) etree.SubElement(size, "width" ).text = '800' # 必须用string etree.SubElement(size, "height" ).text = '600' etree.SubElement(size, "depth" ).text = '3' etree.SubElement(annotation, "segmented" ).text = '0' key_object = etree.SubElement(annotation, "object" ) etree.SubElement(key_object, "name" ).text = “person” bndbox = etree.SubElement(key_object, "bndbox" ) etree.SubElement(bndbox, "xmin" ).text = str ( 100 ) etree.SubElement(bndbox, "ymin" ).text = str ( 200 ) etree.SubElement(bndbox, "xmax" ).text = str ( 300 ) etree.SubElement(bndbox, "ymax" ).text = str ( 400 ) etree.SubElement(key_object, "difficult" ).text = '0' doc = etree.ElementTree(annotation) doc.write( open ( "test.xml" , "w" ), pretty_print = True ) |
2. 读xml
这里可以用xpath直接提取所需的元素的值。比如想要获取上面test.xml文件的x, y坐标:
1
2
3
4
5
|
tree = etree.parse( "test.xml" ) # get bbox for bbox in tree.xpath( '//bndbox' ): # 获取bndbox元素的内容 for corner in bbox.getchildren(): # 便利bndbox元素下的子元素 print corner.text # string类型 |
参考
https://stackoverflow.com/questions/12657043/parse-xml-with-lxml-extract-element-value
以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持服务器之家。
原文链接:http://www.cnblogs.com/arkenstone/p/7338978.html