通过ElasticSearch实现复杂大数据搜索

2019-04-26 本文已影响7人杭宇_8ba6

what who

Elasticsearch不仅仅是Lucene和全文搜索，它还是
• 分布式的实时文件存储，每个字段都被索引并可被搜索
• 分布式的实时分析搜索引擎
• 可以扩展到上百台服务器，处理PB级结构化或非结构化数据

它还有一些特点：
第一：JSON存储属于文档存储
第二：采用倒排索引
第三：没有事务
它还有一些缺点：
第一：有1~2秒延迟落盘
第二：mapping定义不能随便修改，哪怕修改一个字段类型都属于全局重建索引
但有解决方案：采用同义词（alias）新建索引别名，当需要修改时，创建新的索引指向该索引别名，待新索引数据全部新建完，一键删除老索引指向新索引，平滑过度~
有一些基本概念要提一下：
我们首先要做的是存储员工数据，每个文档代表一个员工。在Elasticsearch中存储数据的行为就叫做索引(indexing)，不过在索引之前我们需要明确数据应该存储在哪里。
在Elasticsearch中，文档归属于一种类型(type),而这些类型存在于索引(index)中，我们可以画一些简单的对比图来类比传统关系型数据库：

Relational DB -> Databases -> Tables -> Rows -> Columns
Elasticsearch -> Indices -> Types -> Documents -> Fields

Elasticsearch集群可以包含多个索引(indices)（数据库），
每一个索引可以包含多个类型(types)（表），
每一个类型包含多个文档(documents)（行）， Json
然后每个文档包含多个字段(Fields)（列）。 Json中的一个属性

索引(index)这个词在Elasticsearch中有着不同的含义，一个索引(index)就像是传统关系数据库中的数据库，它是相关文档存储的地方，index的复数是indices 或indexes。

where when

在什么时候下该使用ES呢？

搜索、日志分析（ELK）等等

我们的业务场景：订单数据量庞大，采用分库分表做数据存储，根据openId作为shardingKey，满足前台所有查询场景（所有请求都带openId来查订单信息，粒度是到用户），但后台运营需要查看所有订单信息，粒度就不是单个用户了，而且会带各种维度的查询条件来查询，但订单数据落在了不同的库不同的表中，通过db遍历搜索然后分页肯定不太现实，这种场景ElasticSearch再合适不过了~

和Apache生态的Solr比较呢？
solr.png

elasticsearch与solr的比较：
总结：
1、当单纯的对已有数据进行搜索时，Solr更快。
2、当实时建立索引时, Solr会产生io阻塞，查询性能较差, Elasticsearch具有明显的优势。
3、随着数据量的增加，Solr的搜索效率会变得更低，而Elasticsearch却没有明显的变化。
4、Solr的架构不适合实时搜索的应用。
5、Solr 支持更多格式的数据，而 Elasticsearch 仅支持json文件格式
6、Solr 在传统的搜索应用中表现好于 Elasticsearch，但在处理实时搜索应用时效率明显低于 Elasticsearch
7、Solr 是传统搜索应用的有力解决方案，但 Elasticsearch 更适用于新兴的实时搜索应用

how

ES迭代版本非常快，了解下ES API的技术栈

第一：学习《 [Elasticsearch权威指南]》
第二：用什么版本呢？

从1.7到2.X，初始化方式改了一遍，从2.X到5.X又变了，如今已经有6.X版本，最新目前已经到7.X了，但推荐使用5.X！
注意：2.x版本数据可以直接迁移到 5.x； 5.X版本的数据可以直接迁移到6.x；但是2.x版本数据无法直接迁移到6.x

ES 2.x版本
优点：

Java技术栈, spring-boot-starter-data-elasticsearch 支持in-memory方式启动，单元测试开箱即用
当前线上运行的主流版本，比较稳定
缺点：
版本较老，无法体验新功能，且性能不如5.x
后期升级数据迁移比较麻烦
周边工具版本比较混乱；Kinbana等工具的对应版本需要自己查

ES 5.x版本
优点

版本相对较新，性能较好官方宣称索引吞吐量提升在25%到80%之间，新的数据结构用于存储数值和地理位置字段，性能大幅提升；5.x版本搜索进行了重构，搜索聚合能力大幅提高
周边工具比较全，版本号比较友好。 ES官方在5.x时代统一了 ELK体系的版本号
升级到6.x也比较方便
缺点：
官方宣布已不支持In-Memory模式和Node Client已失效, 如果需要使用in-memory方式单测，需要自己手动配置ES版本、spring-data-elasticsearch版本、打开http访问开关等配置，并行使用REST API访问

第三：客户端如何使用呢？

Java技术栈目前有三种可以选择 Node Client, Transport Client, Rest API,
需要注明的是，官方已经标明NodeClient 已经过期，Transport Client 将在7.x版本开始不再支持，
最终会在7.x 统一到Rest API。目前Transport Client使用范围比较广；Rest API方式兼容性较好；除非在In-memory模式下运行单元测试，否则不推荐NodeClient。
本篇API使用还是采用Transport Client模式，

elasticsearch2.X调用方式：

 public static Client getClient() throws UnknownHostException {
        String clusterName = "elasticsearch";
        List<String> clusterNodes = Arrays.asList("http://172.16.0.29:9300");
        Settings settings = Settings.settingsBuilder().put("cluster.name", clusterName).build();  
        TransportClient client = TransportClient.builder().settings(settings).build();
        for (String node : clusterNodes) {
            URI host = URI.create(node);
            client.addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName(host.getHost()), host.getPort()));
        }
        return client;
    }

elasticsearch5.X调用方式：

public static Client getClient() throws UnknownHostException {
        String clusterName = "shopmall-es";
        List<String> clusterNodes = Arrays.asList("http://172.16.32.69:9300","http://172.16.32.48:9300");
        Settings settings = Settings.builder().put("cluster.name", clusterName).build();
        TransportClient client = new PreBuiltTransportClient(settings);
        for (String node : clusterNodes) {
            URI host = URI.create(node);
            client.addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName(host.getHost()), host.getPort()));
        }
        return client;

撸代码，首先引入需要的包

 <dependency>
            <groupId>org.elasticsearch.client</groupId>
            <artifactId>transport</artifactId>
            <version>5.3.2</version>
        </dependency>
        <dependency>
            <groupId>org.elasticsearch</groupId>
            <artifactId>elasticsearch</artifactId>
            <version>5.3.2</version>
        </dependency>

        <dependency>
            <groupId>com.google.code.gson</groupId>
            <artifactId>gson</artifactId>
            <version>2.8.2</version>
        </dependency>

        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>4.12</version>
            <scope>test</scope>
        </dependency>

        <dependency>
            <groupId>org.apache.logging.log4j</groupId>
            <artifactId>log4j-api</artifactId>
            <version>2.11.1</version>
        </dependency>
        <dependency>
            <groupId>org.apache.logging.log4j</groupId>
            <artifactId>log4j-core</artifactId>
            <version>2.11.1</version>
        </dependency>

/**
 * @Title:
 * @Auther: hangyu
 * @Date: 2019/4/23
 * @Description
 * @Version:1.0
 */
public class Book {
    public static SimpleDateFormat simpleDateFormat = new SimpleDateFormat("yyyy-MM-dd");
    private String id;
    private String title;
    private List<String> authors;
    private String summary;
    private String publish_date;
    private Integer num_reviews;
    private String publisher;

    public Book(String id, String title, List<String> authors, String summary, String publish_date, Integer num_reviews, String publisher) {
        this.id = id;
        this.title = title;
        this.authors = authors;
        this.summary = summary;
        this.publish_date = publish_date;
        this.num_reviews = num_reviews;
        this.publisher = publisher;
    }

    public static SimpleDateFormat getSimpleDateFormat() {
        return simpleDateFormat;
    }

    public static void setSimpleDateFormat(SimpleDateFormat simpleDateFormat) {
        Book.simpleDateFormat = simpleDateFormat;
    }

    public String getId() {
        return id;
    }

    public void setId(String id) {
        this.id = id;
    }

    public String getTitle() {
        return title;
    }

    public void setTitle(String title) {
        this.title = title;
    }

    public List<String> getAuthors() {
        return authors;
    }

    public void setAuthors(List<String> authors) {
        this.authors = authors;
    }

    public String getSummary() {
        return summary;
    }

    public void setSummary(String summary) {
        this.summary = summary;
    }

    public String getPublish_date() {
        return publish_date;
    }

    public void setPublish_date(String publish_date) {
        this.publish_date = publish_date;
    }

    public Integer getNum_reviews() {
        return num_reviews;
    }

    public void setNum_reviews(Integer num_reviews) {
        this.num_reviews = num_reviews;
    }

    public String getPublisher() {
        return publisher;
    }

    public void setPublisher(String publisher) {
        this.publisher = publisher;
    }
}

/**
 * @Title:
 * @Auther: hangyu
 * @Date: 2019/4/23
 * @Description
 * @Version:1.0
 */
public class DataUtil {
    public static SimpleDateFormat dateFormater = new SimpleDateFormat("yyyy-MM-dd");

    /**
     * 模拟获取数据
     */
    public static List<Book> batchData() {
        List<Book> list = new LinkedList<>();
        Book book1 = new Book("1", "Elasticsearch: The Definitive Guide", Arrays.asList("clinton gormley", "zachary tong"),
                "A distibuted real-time search and analytics engine", "2015-02-07", 20, "oreilly");
        Book book2 = new Book("2", "Taming Text: How to Find, Organize, and Manipulate It", Arrays.asList("grant ingersoll", "thomas morton", "drew farris"),
                "organize text using approaches such as full-text search, proper name recognition, clustering, tagging, information extraction, and summarization",
                "2013-01-24", 12, "manning");
        Book book3 = new Book("3", "Elasticsearch in Action", Arrays.asList("radu gheorge", "matthew lee hinman", "roy russo"),
                "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms",
                "2015-12-03", 18, "manning");
        Book book4 = new Book("4", "Solr in Action", Arrays.asList("trey grainger", "timothy potter"), "Comprehensive guide to implementing a scalable search engine using Apache Solr",
                "2014-04-05", 23, "manning");

        list.add(book1);
        list.add(book2);
        list.add(book3);
        list.add(book4);

        return list;
    }

    public static Date parseDate(String dateStr) {
        try {
            return dateFormater.parse(dateStr);
        } catch (ParseException e) {
        }
        return null;
    }
}

/**
 * @Title:
 * @Auther: hangyu
 * @Date: 2019/4/23
 * @Description
 * @Version:1.0
 */
public class Constants {

    // 字段名

    public static String ID = "id";
    public static String TITLE = "title";
    public static String AUTHORS = "authors";
    public static String SUMMARY = "summary";
    public static String PUBLISHDATE = "publish_date";
    public static String PUBLISHER = "publisher";
    public static String NUM_REVIEWS = "num_reviews";

    // 过滤要返回的字段

    public static String[] fetchFieldsTSPD = {ID, TITLE, SUMMARY, PUBLISHDATE};
    public static String[] fetchFieldsTA = {ID, TITLE, AUTHORS};


    // 高亮

    public static HighlightBuilder highlightS = new HighlightBuilder().field(SUMMARY);
}

/**
 * @Title:
 * @Auther: hangyu
 * @Date: 2019/4/24
 * @Description
 * @Version:1.0
 */
public class Response<T> {

    private ResponseCode responseCode;

    private T data;

    public Response(ResponseCode responseCode, T data) {
        this.responseCode = responseCode;
        this.data = data;
    }

    public Response(ResponseCode responseCode) {
        this.responseCode = responseCode;
    }
}

/**
 * @Title:
 * @Auther: hangyu
 * @Date: 2019/4/24
 * @Description
 * @Version:1.0
 */
public enum ResponseCode {

    ESTIMEOUT(1, "超时"),

    FAILEDSHARDS(2, "shard执行失败"),

    OK(0, "成功");

    private Integer code;

    private String desc;

    ResponseCode(Integer code, String desc) {
        this.code = code;
        this.desc = desc;
    }

    public Integer getCode() {
        return code;
    }

    public void setCode(Integer code) {
        this.code = code;
    }

    public String getDesc() {
        return desc;
    }

    public void setDesc(String desc) {
        this.desc = desc;
    }
}

/**
 * @Title:
 * @Auther: hangyu
 * @Date: 2019/4/23
 * @Description
 * @Version:1.0
 */
public class CommonQueryUtils {

    public static Gson gson = new GsonBuilder().setDateFormat("YYYY-MM-dd").create();

    /**
     * 处理ES返回的数据，封装
     */
    public static List<Book> parseResponse(SearchResponse searchResponse) {
        List<Book> list = new LinkedList<>();
        //可打印总记录数
        System.out.println("parseResponse count is "+searchResponse.getHits().getTotalHits());

        for (SearchHit hit : searchResponse.getHits().getHits()) {
            // 用gson直接解析
            Book book = gson.fromJson(hit.getSourceAsString(), Book.class);

            list.add(book);
        }
        return list;
    }

    /**
     * 解析完数据后，构建 Response 对象
     */
    public static Response<List<Book>> buildResponse(SearchResponse searchResponse) {
        // 超时处理
        if (searchResponse.isTimedOut()) {
            return new Response<>(ResponseCode.ESTIMEOUT);
        }
        // 处理ES返回的数据
        List<Book> list = parseResponse(searchResponse);
        // 有shard执行失败
        if (searchResponse.getFailedShards() > 0) {
            return new Response<>(ResponseCode.FAILEDSHARDS, list);
        }
        return new Response<>(ResponseCode.OK, list);
    }
}

休息下~

关键逻辑开始了

/**
 * @Title:
 * @Auther: hangyu
 * @Date: 2019/4/23
 * @Description
 * @Version:1.0
 */
public class EsConfig {

    //http是9200 api访问用9300
    private static String clusterNodes = "127.0.0.1:9300";

    //集群名称必须事先配置在elasticsearch.yml中
    private static String clusterName = "es-book-test";

    public static Client client() {
        Settings settings = Settings.builder().put("cluster.name", clusterName)
                                    .put("client.transport.sniff", true).build();

        TransportClient client = null;
        try {
             client = new PreBuiltTransportClient(settings);
            if (clusterNodes != null && !"".equals(clusterNodes)) {
                for (String node : clusterNodes.split(",")) {
                    String[] nodeInfo = node.split(":");
                    client.addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName(nodeInfo[0]), Integer.parseInt(nodeInfo[1])));
                }
            }
        } catch (Exception e) {
            System.out.println("e"+e);
        }

        return client;
    }
}

/**
 * @Title:
 * @Auther: hangyu
 * @Date: 2019/4/23
 * @Description
 * @Version:1.0
 */
public class DDLAndBulk {

    private static String bookIndex = "book_index";

    private static String bookIndexAlias = "book_index_alias";

    private static String bookType = "book_type";

    public static Gson gson = new GsonBuilder().setDateFormat("YYYY-MM-dd").create();

    /**
     * 创建索引，设置 settings，设置mappings
     */
    public static void createIndex() {
        int settingShards = 1;
        int settingReplicas = 0;

        Client client = EsConfig.client();
        // 判断索引是否存在，存在则删除
        IndicesExistsResponse indicesExistsResponse = client.admin().indices().prepareExists(bookIndex).get();

        if (indicesExistsResponse.isExists()) {
            System.out.println("索引 " + bookIndex + " 存在！");
            // 删除索引，防止报异常  ResourceAlreadyExistsException[index [bookdb_index/yL05ZfXFQ4GjgOEM5x8tFQ] already exists
            DeleteIndexResponse deleteResponse = client.admin().indices().prepareDelete(bookIndex).get();
            if (deleteResponse.isAcknowledged()){
                System.out.println("索引" + bookIndex + "已删除");
            }else {
                System.out.println("索引" + bookIndex + "删除失败");
            }


        } else {
            System.out.println("索引 " + bookIndex + " 不存在！");
        }

        // 设置Settings，第一步新建index
        CreateIndexResponse response = client.admin().indices().prepareCreate(bookIndex)
                                             .setSettings(Settings.builder()
                                                                  .put("index.number_of_shards", settingShards)
                                                                  .put("index.number_of_replicas", settingReplicas))
                                             .get();

        // 查看结果
        GetSettingsResponse getSettingsResponse = client.admin().indices()
                                                        .prepareGetSettings(bookIndex).get();
        System.out.println("索引设置结果");
        for (ObjectObjectCursor<String, Settings> cursor : getSettingsResponse.getIndexToSettings()) {
            String index = cursor.key;
            Settings settings = cursor.value;
            Integer shards = settings.getAsInt("index.number_of_shards", null);
            Integer replicas = settings.getAsInt("index.number_of_replicas", null);
            System.out.println("index:" + index + ", shards:" + shards + ", replicas:" + replicas);
        }
    }

    /**
     * Bulk 批量插入数据
     */
    public static void bulk() {
        List<Book> list = DataUtil.batchData();

        Client client = EsConfig.client();

        BulkRequestBuilder bulkRequestBuilder = client.prepareBulk();

        //第二步新建type和创建mapping 其实也可以忽略，如果不设置mapping，则es通过source中数据自动添加数据类型
        if (!client.admin().indices().prepareTypesExists(bookIndex).setTypes(bookType).get().isExists()){
            client.admin().indices().preparePutMapping(bookIndex).setType(bookType).setSource(readFileTOString("es-book-mapping.json")).get()
                       .isAcknowledged();
            //第二步和第三步中间可以加一小步，可让之后mapping得到扩展，那就是创建索引别名
            createAlias(bookIndex, bookIndexAlias);
        }

        // 添加index操作到 bulk 中
        list.forEach(book -> {
            // 第三步插入数据,ps:第三步可以包含第二步的新建type，并省略mapping构建，让数据自动由es识别出数据类型
            // 新版的API中使用setSource时，参数的个数必须是偶数，否则需要加上 setSource(json, XContentType.JSON)
            bulkRequestBuilder.add(client.prepareIndex(bookIndexAlias, bookType, book.getId()).setSource(gson.toJson(book), XContentType.JSON));
        });

        BulkResponse responses = bulkRequestBuilder.get();
        if (responses.hasFailures()) {
            // bulk有失败
            for (BulkItemResponse res : responses) {
                System.out.println(res.getFailure());
            }
        }
    }

    /**
     * 创建别名
     */
    private static boolean createAlias(String indexName, String indexAlias) {
        Client client = EsConfig.client();

        // 获取老的索引和别名对应关系
        List<String> oldIndexName = new ArrayList<String>();
        GetAliasesResponse getAliases = client.admin().indices().prepareGetAliases(indexAlias).get();
        for (ObjectCursor<String> objectCursor : getAliases.getAliases().keys()) {
            if (!indexName.equals(objectCursor.value)) {
                oldIndexName.add(objectCursor.value);
            }
        }
        // 添加新的别名
        IndicesAliasesResponse r = client.admin().indices().prepareAliases().addAlias(indexName, indexAlias)
                                              .execute().actionGet();
        if (!r.isAcknowledged()) {
            throw new RuntimeException("[ES Check] indexName:" + indexName + ", 创建别名失败:" + indexAlias);
        }
        if (oldIndexName.size() > 0) {
            System.out.println("[ES Check] indexAlias:"+indexAlias+"获取到老的别名对应关系 oldIndexName:{}."+oldIndexName);
            // 删除老关系
            IndicesAliasesResponse r2 = client.admin().indices().prepareAliases()
                                                   .removeAlias(oldIndexName.toArray(new String[] {}), indexAlias).get();// .isAcknowledged();
            if (!r2.isAcknowledged()) {
                throw new RuntimeException("[ES Check] indexAlias:" + indexAlias + ", 删除老的别名对应关系失败:" + oldIndexName);
            } else {
                System.out.println("[ES Check] indexAlias:"+indexAlias+", 删除老的别名对应关系 oldIndexName:{}."+oldIndexName);
            }
        }

        return true;
    }

    public static String readFileTOString(String name) {

        InputStream inputStream = getResourceAsStream(name);

        if (null == inputStream){
            return null;
        }
        StringBuilder sb = new StringBuilder("");

        BufferedReader reader = null;
        try {
            reader = new BufferedReader(new InputStreamReader(inputStream));
            String tempString = null;
            // 一次读入一行，直到读入null为文件结束
            while ((tempString = reader.readLine()) != null) {
                sb.append(tempString);
            }
            reader.close();
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            if (reader != null) {
                try {
                    reader.close();
                } catch (IOException e1) {
                }
            }
        }

        return sb.toString();
    }

    public static InputStream getResourceAsStream(String name) {

        InputStream resourceStream = null;

        // Try the current Thread context classloader
        ClassLoader classLoader = Thread.currentThread().getContextClassLoader();
        resourceStream = classLoader.getResourceAsStream(name);
        if (resourceStream == null) {
            // Finally, try the classloader for this class
            classLoader = DDLAndBulk.class.getClassLoader();
            resourceStream = classLoader.getResourceAsStream(name);
        }

        return resourceStream;
    }

    public static void main(String[] args) {
        createIndex();
        bulk();
    }

}

{
    "book_type": {
        "properties": {
            "id": {
                "type": "long"
            },
            "title": {
                "type": "string",
                "index": "analyzed"
            },
            "authors": {
                "type": "string",
                "index": "not_analyzed"
            },
            "summary": {
                "type": "string",
                "index": "analyzed"
            },
            "publish_date": {
                "type": "date",
                "index": "not_analyzed"
            },
            "num_reviews": {
                "type": "integer",
                "index": "not_analyzed"
            },
            "publisher": {
                "type": "string",
                "index": "not_analyzed"
            }
        }
    }
}

/**
 * @Title:
 * @Auther: hangyu
 * @Date: 2019/4/24
 * @Description
 * @Version:1.0
 */
public class BasicMatchQueryService {

    private static Client client = EsConfig.client();

    private static String bookIndexAlias = "book_index_alias";

    private static String bookType = "book_type";


    public static void main(String[] args) {
        //multiBatch();
        //match();
        boolPage();
        //boolPageMatch();
        //fuzzy();
        //wildcard();
        //phrase();
        //phrasePrefix();
    }
    /**
     * 进行ES查询，执行请求前后打印出 查询语句 和 查询结果
     */
    private static SearchResponse requestGet(String queryName, SearchRequestBuilder requestBuilder) {
        System.out.println(queryName + " 构建的查询：" + requestBuilder.toString());
        SearchResponse searchResponse = requestBuilder.get();
        System.out.println(queryName + " 搜索结果：" + searchResponse.toString());
        return searchResponse;
    }

    /**
     * 1.1 对 "guide" 执行全文检索
     * 测试：http://localhost:8080/basicmatch/multimatch?query=guide
     */
    public static Response<List<Book>> multiBatch() {
        MultiMatchQueryBuilder queryBuilder = new MultiMatchQueryBuilder("guide");

        SearchRequestBuilder requestBuilder = client.prepareSearch(bookIndexAlias)
                                                    .setTypes(bookType).setQuery(queryBuilder);

        SearchResponse searchResponse = requestGet("multiBatch", requestBuilder);

        return CommonQueryUtils.buildResponse(searchResponse);
    }

    /**
     * 1.2 指定特定字段检索
     * 测试：http://localhost:8080/basicmatch/match?title=in action&from=0&size=4
     */
    public static void match() {
        MatchQueryBuilder matchQueryBuilder = new MatchQueryBuilder(Constants.TITLE, "in Action");
        // 高亮
        HighlightBuilder highlightBuilder = new HighlightBuilder().field(Constants.TITLE).fragmentSize(200);

        SearchRequestBuilder requestBuilder = client.prepareSearch(bookIndexAlias)
                                                    .setTypes(bookType).setQuery(matchQueryBuilder)
                                                    .setFrom(0).setSize(4)
                                                    .highlighter(highlightBuilder)
                                                    // 设置 _source 要返回的字段
                                                    .setFetchSource(Constants.fetchFieldsTSPD, null);

        SearchResponse searchResponse = requestGet("multiBatch", requestBuilder);

    }

    /**
     * 精确匹配
     * @return
     */
    public static Response<List<Book>> boolPage() {
        BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();

        RangeQueryBuilder rangeQueryBuilder = new RangeQueryBuilder(Constants.NUM_REVIEWS)
                .gte(15).lte(50);

        boolQueryBuilder.should().add(QueryBuilders.termQuery(Constants.PUBLISHER, "manning"));
        boolQueryBuilder.should().add(QueryBuilders.termQuery(Constants.PUBLISHER, "oreilly"));

        //term 精确匹配 range 范围匹配
        //should表示或者关系(or) must表示并且(and) mustNot并且不是(and not)
        boolQueryBuilder.mustNot(QueryBuilders.termQuery(Constants.AUTHORS, "radu gheorge")).filter().add(rangeQueryBuilder);
        //boolQueryBuilder.must(rangeQueryBuilder).mustNot(QueryBuilders.termQuery(Constants.AUTHORS, "radu gheorge"));

        SearchRequestBuilder requestBuilder = client.prepareSearch(bookIndexAlias).setTypes(bookType).setQuery(boolQueryBuilder)
                .setFrom(0).setSize(10).addSort("id", SortOrder.DESC);

        SearchResponse searchResponse = requestGet("bool", requestBuilder);

        return CommonQueryUtils.buildResponse(searchResponse);
    }

    /**
     * 全文匹配（针对text类型的字段进行全文检索）
     * @return
     */
    public static Response<List<Book>> boolPageMatch() {
        BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();

        //matchQuery 分词匹配 matchPhraseQuery 短语匹配
        boolQueryBuilder.must(QueryBuilders.matchQuery(Constants.SUMMARY,"engine using"))
                        .mustNot(QueryBuilders.matchPhraseQuery(Constants.SUMMARY, "analytics engine"));

        SearchRequestBuilder requestBuilder = client.prepareSearch(bookIndexAlias).setTypes(bookType).setQuery(boolQueryBuilder)
                                                    .setFrom(0).setSize(10).addSort(SortBuilders.scoreSort());

        SearchResponse searchResponse = requestGet("bool", requestBuilder);

        return CommonQueryUtils.buildResponse(searchResponse);
    }

    /**
     *  模糊检索(
     * @return
     */
    public static Response<List<Book>> fuzzy() {
        MultiMatchQueryBuilder queryBuilder = new MultiMatchQueryBuilder("elasticseares")
                .field("title").field("summary")
                .fuzziness(Fuzziness.AUTO);

        SearchRequestBuilder requestBuilder = client.prepareSearch(bookIndexAlias)
                                                    .setTypes(bookType).setQuery(queryBuilder)
                                                    .setFetchSource(Constants.fetchFieldsTSPD, null)
                                                    .setSize(2);

        SearchResponse searchResponse = requestGet("fuzzy", requestBuilder);

        return CommonQueryUtils.buildResponse(searchResponse);
    }

    /**
     * 通配符检索、要查找具有以 "t" 字母开头的作者的所有记录
     */
    public static Response<List<Book>> wildcard() {
        WildcardQueryBuilder wildcardQueryBuilder = new WildcardQueryBuilder(Constants.AUTHORS, "t*");
        HighlightBuilder highlightBuilder = new HighlightBuilder().field(Constants.AUTHORS, 200);

        SearchRequestBuilder requestBuilder = client.prepareSearch(bookIndexAlias)
                                                    .setTypes(bookType).setQuery(wildcardQueryBuilder)
                                                    .setFetchSource(Constants.fetchFieldsTA, null)
                                                    .highlighter(highlightBuilder);

        SearchResponse searchResponse = requestGet("wildcard", requestBuilder);

        return CommonQueryUtils.buildResponse(searchResponse);
    }

    /**
     * 正则表达式
     * @return
     */
    public static Response<List<Book>> regexp() {
        String regexp = "t[a-z]*n";
        RegexpQueryBuilder queryBuilder = new RegexpQueryBuilder(Constants.AUTHORS, regexp);
        HighlightBuilder highlightBuilder = new HighlightBuilder().field(Constants.AUTHORS);

        SearchRequestBuilder requestBuilder = client.prepareSearch(bookIndexAlias)
                                                    .setQuery(queryBuilder).setTypes(bookType).highlighter(highlightBuilder)
                                                    .setFetchSource(Constants.fetchFieldsTA, null);

        SearchResponse searchResponse = requestGet("regexp", requestBuilder);

        return CommonQueryUtils.buildResponse(searchResponse);
    }

    /**
     * 短语匹配-必须在一个单元中同时包含这两个词，可以不相连，而不是分词包含其中之一
     *
     *  "summary":"Comprehensive guide to implementing a scalable search engine using Apache Solr",
     *      "summary":"A distibuted real-time search and analytics engine",
     * @return
     */
    public static Response<List<Book>> phrase() {
        MultiMatchQueryBuilder queryBuilder = new MultiMatchQueryBuilder("search engine")
                .field(Constants.SUMMARY)
                .type(MultiMatchQueryBuilder.Type.PHRASE).slop(3);

        SearchRequestBuilder requestBuilder = client.prepareSearch(bookIndexAlias).setTypes(bookType)
                                                    .setQuery(queryBuilder)
                                                    .setFetchSource(Constants.fetchFieldsTSPD, null);


        SearchResponse searchResponse = requestGet("phrase", requestBuilder);

        return CommonQueryUtils.buildResponse(searchResponse);
    }

    /**
     * 匹配词组前缀检索
     * @return
     */
    public static Response<List<Book>> phrasePrefix() {
        MatchPhrasePrefixQueryBuilder queryBuilder = new MatchPhrasePrefixQueryBuilder(Constants.SUMMARY, "search en")
                .slop(3).maxExpansions(10);

        SearchRequestBuilder requestBuilder = client.prepareSearch(bookIndexAlias).setTypes(bookType)
                                                    .setQuery(queryBuilder).setFetchSource(Constants.fetchFieldsTSPD, null);

        SearchResponse searchResponse = requestGet("phrasePrefix", requestBuilder);

        return CommonQueryUtils.buildResponse(searchResponse);
    }
}

之后的任务，研究下6.X甚至7.X的新特性，如何采用Rest Api去实现

接下来研究如何结合kibana的使用
如何结合logStash的使用
ElasticSearch进阶
最后一个小知识点fuzzy
fuzzy搜索技术 --> 自动将拼写错误的搜索文本，进行纠正，纠正以后去尝试匹配索引中的数据
surprize --> 拼写错误 --> surprise --> s -> z
surprize --> surprise -> z -> s，纠正一个字母，就可以匹配上，所以在fuziness指定的2范围内
surprize --> surprised -> z -> s，末尾加个d，纠正了2次，也可以匹配上，在fuziness指定的2范围内
surprize --> surprising -> z -> s，去掉e，ing，3次，总共要5次，才可以匹配上，始终纠正不了

经过测试，fuzzy可以自动纠错两次~

通过ElasticSearch实现复杂大数据搜索

what who

where when

how

之后的任务，研究下6.X甚至7.X的新特性，如何采用Rest Api去实现

猜你喜欢

热点阅读