XPath 基础学习

2018-12-17 本文已影响0人 CNSTT

1、准备学习

在爬虫过程中，需要提取需要的元素，本文介绍了两种XPath方式：

$x('/html/head/title')

在开发者工具中右键Element Copy > Copy XPath即可

启动Scrapy Shell终端

scrapy shell https://www.gumtree.com/

xpath匹配模板

response.xpath('***').extract()

.extract('//html') 匹配的都是Unicode字符串
.re('[.0-9]+') 匹配正则表达式

获取title标题

response.xpath('/html/head/title').extract()

获取title标题内容

response.xpath('/html/head/title/text()').extract()

获取a标签下的url

response.xpath('/html//div/p/a/@href').extract()

获取class为grid-list-item的第三个div下的图片路径

response.xpath('/html//div[@class="grid-list-item"][3]//img/@src').extract()

xpath数组下标从 1 开始！区别于大多数的 0