文字定位, 我选XPATH

2019-04-17 本文已影响0人 halfempty

前言

无论是玩爬虫, 还是自动化测试, 元素定位少不了
虽然CSS选择器效率更佳, 但天生不具备文字定位功能
加上前端编码不规范, 前端技术的革新(如vue), 好些css属性开始淡化, 可能通篇就一个id='app'
所以本人更倾向于xpath

文字更直观, 更容易理解定位的对象
前端组件化后, xpath可以更通用, 比如表单中的输入控件, 下拉框等

xpath文字定位

1. text() - 匹配文字

像div[@id='app']这类属性定位大家应该很熟悉了, 就不赘述
下面截取豆瓣电影部分页面源码

在这里插入图片描述
导演/编剧的结构一样, 用xpath定位如下:

//span[text()='导演']
//span[text()='编剧']

2. 同级元素

如果要获取导演的具体信息, 可以结合following-sibling, 如果是同级往上，则使用preceding-sibling

following-sibling
Indicates all the nodes that have the same parent as the context node and appear after the context node in the source document.

preceding-sibling
Indicates all the nodes that have the same parent as the context node and appear before the context node in the source document.

//span[text()='导演']/following-sibling::span/a
//span[text()='编剧']/following-sibling::span/a

试想, 如果类似的属性都是相同结构, 只需要将文字作为参数, 就能定位到指定的数据

3. contains() - 文字包含

contains方法放宽文字匹配规则, 部分匹配即可, 更加灵活

4. normalize-space() - 去除空格

有时, 文字的前后有空格, 或者回车, 可以使用normalize-space方法去除

The normalize-space function strips leading and trailing white-space
from a string, replaces sequences of whitespace characters by a single
space, and returns the resulting string.