js利用正则表达式解析超链接

2020-01-15 本文已影响0人 yahzon

目标：在文本中，找到<a>元素，解析超链接，提取链接地址和链接文字。

素材：

<a href="/CKFinderJava/userfiles/files/xxxx.pdf">手机验证说明</a>
some text
<br />
<br />
<a href="/somesite/something.xxx">这是一个链接</a><br />

正则表达式

<a\s+href\s*=\s*[\\"|"](.+)[\\"|"]>(.+)<\/a>

代码v0.01：
bug: 正则表达式被替换成链接地址、文字，其余的东东留了下来。

const regexp =  /<a\s+href\s*=\s*[\\"|"](.+)[\\"|"]>(.+)<\/a>/g;
let href_text = str.replace(regexp, "$2");
let href_uri = str.replace(regexp, "$1");
console.log(href_text);
console.log(href_uri);

需要改进：
先提取出正则表达式，匹配成数组，然后替换。只保留链接文字，链接地址。

attachList = [];
const regexp =  /<a\s+href\s*=\s*[\\"|"](.+)[\\"|"]>(.+)<\/a>/gi;                                               
str.match(regexp).forEach( item=> {
  //console.log(`item is ${item}`);
  let attach = {};                          
  attach.text = item.replace(regexp,'$2');
  attach.uri = item.replace(regexp,'$1');
  attachList.push(attach);
});

js利用正则表达式解析超链接

猜你喜欢

热点阅读