1.2解析网页

2016-06-02 本文已影响0人 doubleyou1001

Soup = BeautifulSoup(html,'lxml')

data = Soup.select('???')

title.get_text()

对于获得标签的属性，采用get 方法

image.get('src‘）

stripped_strings方法，有效解决多对一的标签，
获得一个副标签下的所有子标签的文本,由于内容是成组的，所以列表化

list(cate.stripped_strings)

打开文件有2种方式

fs = open("文件地址"，“r")
print(fs.read())
fs.close

支持相对地址和绝对地址，必须调用close()函数，否则内存泄漏

with open("文件地址","r") as fs:
      print(fs.read())