python3爬虫获取html内容及各属性值的方法_Python

今天用到BeautifulSoup解析爬下来的网页数据

首先导入包from bs4 import BeautifulSoup

然后可以利用urllib请求数据

记得要导包

				?

									import urllib.request

然后调用urlopen，读取数据

				?

									f=urllib.request.urlopen(‘http://jingyan.baidu.com/article/455a9950bc94b8a166277898.html‘) 

									response=f.read()

这里我们就不请求数据了，直接用本地的html代码，如下

注意：”'xxx”'是多行注释

				?

									#python3

									from bs4 import BeautifulSoup

									html='''<html>

									<head>

									 <title class='ceshi'>super 哈哈 star</title>

									</head>

									<body>

									 天下第一帅

									 <p class='sister'>

									  是不是

									 </p>

									</body>

									</html>'''

									#用BeautifulSoup解析数据 python3 必须传入参数二'html.parser' 得到一个对象，接下来获取对象的相关属性

									html=BeautifulSoup(html,'html.parser')

									# 读取title内容

									print(html.title)

									# 读取title属性

									attrs=html.title.attrs

									print(attrs)

									# 获取属性attrs['class'] ---->['ceshi'] 这是一个list 通过下标可以获取值

									print(attrs['class'][0])

									# 读取body

									print(html.body)

									读取数据还可以通过BeautifulSoup的select方法

									html.select()

									#按标签名查找 

									soup.select('title')

									soup.select('body')

									# 按类名查找

									soup.select('.sister')

									# 按id名查找

									# p标签中id为link的标签

									soup.select('p #link')

									#取标签里面的值

									soup.p.string

									#取标签里属性值 通过href获取

									html['href']

以上这篇python3爬虫获取html内容及各属性值的方法就是小编分享给大家的全部内容了，希望能给大家一个参考，也希望大家多多支持服务器之家。

原文链接：https://blog.csdn.net/lzq520210/article/details/76855606

python3爬虫获取html内容及各属性值的方法

相关文章

热门资讯