elasticsearch

logstash过滤mysql大字段content字段的html

2019-08-07  本文已影响0人  alfred88

众多初学者,如果有老的数据,从编辑器里生成出来的html代码片段,导入elasticsearch中,会出现搜索高亮时把html显示出来,体验不好,同步logstash时,需要进行filter过滤器先过滤掉html代码

filter{
    mutate{
        gsub => [ "content", "<script(.*?)</script>", "" ]
    }
    mutate{
        gsub => [ "content", "<iframe(.*?)</iframe>", "" ]
    }
    mutate{
        gsub => [ "content", "<style(.*?)</style>", "" ]
    }
    mutate{
        gsub => [ "content", "<(.*?)>", "" ]
    }
    mutate{
        gsub => [ "content", "&nbsp;", "" ]
    }
}

许多需要先在mysql中过滤,尤其是时间类型字段,建索引时也要指定格式:

"format"=>"yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis||strict_date_optional_time"

SELECT a.id,a.title,b.content,b.content as content_old,CONCAT(a.addtime) AS addtime,CONCAT(a.autotime) AS autotime,a.views,a.zans,a.type_a,a.type_b,CONCAT(a.isshow) AS isshow,CONCAT(a.isdelete) AS isdelete,if(isnull(a.deletetime),0,a.deletetime) as deletetime FROM web_information a
上一篇 下一篇

猜你喜欢

热点阅读