javaweb使用solr实现搜索功能

2017-07-30  本文已影响0人  明明找灵气

需求明确,原应用使用mysql数据库,基本功能CRUD,增加一个搜索功能。数据库名wenda,下有一个question表,搜索目标question的title和content(返回题目或内容包含搜索关键字的question)。1.下载需要的工具;2.分词工具打包;3.solr搭建

一.准备工作

ide: intellij
架构:springmvc+maven+mybatis
需要安装的工具
solr(搜索用)+IK Analyzer(中文分词用)+maven(打包)+mysql的jar包
这个分词工具不是“很好”,但是比没有好很多!没有的话,中文搜索结果简直没法看。
打包可以不用maven,我用的maven
如果是mac电脑,安装下homebrew,brew install <name>就可以安装 solr 和 maven
分词工具源码下载地址:https://code.google.com/archive/p/ik-analyzer/downloads
估计需要翻墙,下载源码,要源码!后面自己打包用
solr 和 maven 轻轻百度一下很多

二.打包分词工具

A870F4FE-0790-4F9D-BB87-385FE23B2C1F.png
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>org.wltea</groupId>
    <artifactId>ik-analyzer</artifactId>
    <version>6.6.0</version>

    <dependencies>
        <dependency>
            <groupId>org.apache.lucene</groupId>
            <artifactId>lucene-core</artifactId>
            <version>6.6.0</version>
        </dependency>

        <dependency>
            <groupId>org.apache.lucene</groupId>
            <artifactId>lucene-analyzers-common</artifactId>
            <version>6.6.0</version>
        </dependency>

        <dependency>
            <groupId>org.apache.lucene</groupId>
            <artifactId>lucene-queryparser</artifactId>
            <version>6.6.0</version>
        </dependency>

    </dependencies>

    <build>
        <resources>
            <resource>
                <directory>src/main/java</directory>
                <includes>
                    <include>**/*.dic</include>
                </includes>
            </resource>
            <resource>
                <directory>src/main/resources</directory>
                <includes>
                    <include>**/*.dic</include>
                    <include>**/*.xml</include>
                </includes>
            </resource>
        </resources>
    </build>
</project>
//IKTokenizer,java
public IKTokenizer(boolean useSmart){
        //super(in); //注释掉这句话和方法参数Reader in
        offsetAtt = addAttribute(OffsetAttribute.class);
        termAtt = addAttribute(CharTermAttribute.class);
        typeAtt = addAttribute(TypeAttribute.class);
        _IKImplement = new IKSegmenter(input , useSmart);
    }
//IKAnalyzer.java
@Override
    protected TokenStreamComponents createComponents(String fieldName) {
        //方法参数Reader in 去掉,下面 new IKTokenizer()中的参数in也去掉
        Tokenizer _IKTokenizer = new IKTokenizer( this.useSmart());
        return new TokenStreamComponents(_IKTokenizer);
    }

实现util中的类
public class IKToKenizerFactory extends TokenizerFactory {

private boolean useSmart;

public IKToKenizerFactory(Map<String,String> args) {
    super(args);
    useSmart = getBoolean(args, "useSmart", false);
}
@Override
public Tokenizer create(AttributeFactory attributeFactory) {
    Tokenizer tokenizer = new IKTokenizer(useSmart);
    return tokenizer;
}

}

三.搭建solr

下载后的目录结构

image.png
  <dataConfig>
  <dataSource driver="com.mysql.jdbc.Driver" password="你的数据库密码" type="JdbcDataSource" url="jdbc:mysql://localhost/wenda" user="你的数据库账户"/>
  <document>
    <entity name="question" query="select id,title,content from question">
      <field column="title" name="question_title"/>
      <field column="content" name="question_content"/>
    </entity>
  </document>
</dataConfig>

注:url中的wenda对应我的数据库,entity name对应数据表,query是sql语句,field column是表中的属性名,name就是它映射到solr上的名字(也就是表中title在solr中是question_title),也可以都用title

ii.数据库和solr交互
solr/libexec/ 目录下加ext文件夹,在加mysql文件下,在家mysql的jar包;
ikanalyzer文件夹,加入分词器的jar包


image.png

/usr/local/Cellar/solr/6.6.0/server/solr /wenda/conf/ 下的solrconfig.xml,找个“合适的地方”加入下面几句话(不要加在别的标签内)。

<--! jar包语句加在lib处,请求处理语句加在requestHandler处比较好看 文件中搜索lib 就找到jar添加的位置,一般同一类行的语句都是相邻的-->
<!-- 导入jar包-->
<lib dir="${solr.install.dir}/libexec/dist/" regex="solr-dataimporthandler-\d.*\.jar" />
  <lib dir="${solr.install.dir}/libexec/ext/ikanalyzer" regex=".*\.jar" />
  <lib dir="${solr.install.dir}/libexec/ext/mysql" regex=".*\.jar" />

  <!-- A request handler that returns indented JSON by default -->
  <requestHandler name="/query" class="solr.SearchHandler">
    <lst name="defaults">
      <str name="echoParams">explicit</str>
      <str name="wt">json</str>
      <str name="indent">true</str>
    </lst>
  </requestHandler>

iii.加入分词器
加入分词器,并声明哪些字段是用这个分词器
/usr/local/Cellar/solr/6.6.0/server/solr /wenda/conf/ 下的managed-schema文件,找个“合适的地方”加入下面几句话(不要加在别的标签内)。

<!-- 文末有解释 -->

<!-- 中文分词-->
  <fieldType class="solr.TextField" name="text_ik">
    <!-- 索引时候的分词 -->
    <analyzer type="index">
      <tokenizer class="org.wltea.analyzer.util.IKToKenizerFactory" useSmart="false"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <!-- in this example, we will only use synonyms at query time
        <filter class="solr.SynonymGraphFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/><filter class="solr.FlattenGraphFilterFactory"/>-->
    </analyzer>
    <!-- 查询时候的分词器 -->
    <analyzer type="query">
      <tokenizer class="org.wltea.analyzer.util.IKToKenizerFactory" useSmart="true"/>
    </analyzer>
  </fieldType>

 <field indexed="true" multiValued="true" name="_text_" stored="false" type="text_ik"/>
  <field indexed="true" multiValued="true" name="question_title" stored="true" type="text_ik"/>
  <field indexed="true" multiValued="true" name="question_content" stored="true" type="text_ik"/>

注:1.每个标签的属性顺序可能和你的不一样,我用自动格式化,按字母顺序了。2.<fieldType>标签,添加中文分词器,命名为text_ik,注意到索引和搜索的差别是否是智能模式,差别看下图。望文生义,索引非只能,只要是个词语就建个索引(喝多关键字都能搜到它,但是可能不是我们要的),查询是智能的,进行了语义识别。3.<field>标签 主要是name和type属性,查询的域类型名用text_ik分词器。4.默认情况下域类型都是text,用text_general分词器,这种中文不友好,所以直接把text的分词器也改成test_ik。

image.png

四.项目中增加搜索功能代码

到这如果不出问题,恭喜你,胜利就在眼前。


/**
 * 14:06 on 2017/7/30
 * If u like , it is created by TXM,
 * Else u should have like it.
 */
@Service
public class SearchService {
    private static final Logger logger = LoggerFactory.getLogger(SearchService.class);

    private static final String SOLR_URL = "http://localhost:8983/solr/wenda";
    private HttpSolrClient client = new HttpSolrClient.Builder(SOLR_URL).build();
    private static final String QUESTION_TITLE_FIELD = "question_title";
    private static final String QUESTION_CONTENT_FIELD = "question_content";

    public List<Question> searchQuestion(String keyword, int offset, int count,
                                         String hlPre, String hlPos) throws Exception{


        List<Question> questionList = new ArrayList<>();
        SolrQuery query = new SolrQuery(keyword);
        query.setRows(count);
        query.setStart(offset);
        query.setHighlight(true);
        query.setHighlightSimplePre(hlPre);
        query.setHighlightSimplePost(hlPos);
        query.set("hl.fl", QUESTION_CONTENT_FIELD + "," + QUESTION_TITLE_FIELD);
        QueryResponse response = client.query(query);
        for (Map.Entry<String, Map<String, List<String>>> entry : response.getHighlighting().entrySet()) {
            Question q = new Question();
            q.setId(Integer.parseInt(entry.getKey()));
            if (entry.getValue().containsKey(QUESTION_CONTENT_FIELD)) {
                List<String> contentList = entry.getValue().get(QUESTION_CONTENT_FIELD);
                if (contentList.size() > 0) {
                    q.setContent(contentList.get(0));
                }
            }
            if (entry.getValue().containsKey(QUESTION_TITLE_FIELD)) {
                List<String> titleList = entry.getValue().get(QUESTION_TITLE_FIELD);
                if (titleList.size() > 0) {
                    q.setTitle(titleList.get(0));
                }
            }
            questionList.add(q);
        }

        return questionList;
    }


    public boolean indexQuestion(int qid, String title, String content) throws Exception{
        SolrInputDocument doc = new SolrInputDocument();
        doc.setField("id", qid);
        doc.setField(QUESTION_TITLE_FIELD, title);
        doc.setField(QUESTION_CONTENT_FIELD, content);
        UpdateResponse response = client.add(doc, 1000);
        return response != null && response.getStatus() == 0;
    }
}
@Controller
public class SearchController {
    private static final Logger logger = LoggerFactory.getLogger(SearchController.class);

    @Autowired
    SearchService searchService;


    @Autowired
    QuestionService questionService;

    @RequestMapping(path = {"/search"}, method = {RequestMethod.GET})
    public String search (Model model, @RequestParam("q") String keyword,
                          @RequestParam(value = "offset", defaultValue = "0") int offset,
                          @RequestParam(value = "count", defaultValue = "10") int count) {
        try {
            List<Question> questionList = searchService.searchQuestion(keyword, offset, count, "<em>", "</em>");
            List<ViewObject> vos = new ArrayList<>();
            for (Question question : questionList) {
                Question q = questionService.selectById(question.getId());
                ViewObject vo = new ViewObject();
                if (question.getContent() != null) {
                    q.setContent(question.getContent());
                }
                if (question.getTitle() != null) {
                    q.setTitle(question.getTitle());
                }
                vo.set("question", question);
                vos.add(vo);
            }
            model.addAttribute("vos", vos);
            model.addAttribute("keyword", keyword);
        } catch (Exception e) {
            logger.error("搜索评论失败 " + e.getMessage());
        }
        return "result";
    }
}

可以用postman工具测试功能
写好路径localhost:8080/search和参数
自动变成localhost:8080/search?q="分布式"


x

总算搞完了,也是今天刚学的。很多具体的还没搞明白,勉强先跑起来再研究。

上一篇 下一篇

猜你喜欢

热点阅读