Python利用BeautifulSoup解析Html的方法示例_Python

介绍

Beautiful Soup提供一些简单的、python式的函数用来处理导航、搜索、修改分析树等功能。它是一个工具箱，通过解析文档为用户提供需要抓取的数据，因为简单，所以不需要多少代码就可以写出一个完整的应用程序。

Beautiful Soup自动将输入文档转换为Unicode编码，输出文档转换为utf-8编码。你不需要考虑编码方式，除非文档没有指定一个编码方式，这时，Beautiful Soup就不能自动识别编码方式了。然后，你仅仅需要说明一下原始编码方式就可以了。

Beautiful Soup已成为和lxml、html6lib一样出色的python解释器，为用户灵活地提供不同的解析策略或强劲的速度。

本文将给大家详细介绍关于Python利用BeautifulSoup 解析Html的方法，下面话不多说了，来一起看看详细的介绍：

1. 安装Beautifulsoup4

									pip install beautifulsoup4

									pip install lxml

									pip install html5lib

lxml 和 html5lib 是解析器

2. html

									<!-- This is the example.html file. -->

									<html><head><title>The Website Title</title></head>

									<body>

									<p>Download my <strong>Python</strong> book from <a href="http://inventwithpython.com" rel="external nofollow" >my website</a>.</p>

									<p class="slogan">Learn Python the easy way!</p>

									<p>By <span id="author">Al Sweigart</span></p>

									</body></html>

上面的html保存html文件

3.开始解析

									import bs4

									exampleFile = open('example.html')

									exampleSoup = bs4.BeautifulSoup(exampleFile.read(),'html5lib')

									elems = exampleSoup.select('#author')

									type(elems)

									print (elems[0].getText())