通过ElasticSearch实现复杂大数据搜索
what who
Elasticsearch不仅仅是Lucene和全文搜索,它还是
• 分布式的实时文件存储,每个字段都被索引并可被搜索
• 分布式的实时分析搜索引擎
• 可以扩展到上百台服务器,处理PB级结构化或非结构化数据
-
它还有一些特点:
第一:JSON存储属于文档存储
第二:采用倒排索引
第三:没有事务 -
它还有一些缺点:
第一:有1~2秒延迟落盘
第二:mapping定义不能随便修改,哪怕修改一个字段类型都属于全局重建索引
但有解决方案:采用同义词(alias)新建索引别名,当需要修改时,创建新的索引指向该索引别名,待新索引数据全部新建完,一键删除老索引指向新索引,平滑过度~ -
有一些基本概念要提一下:
我们首先要做的是存储员工数据,每个文档代表一个员工。在Elasticsearch中存储数据的行为就叫做索引(indexing),不过在索引之前我们需要明确数据应该存储在哪里。
在Elasticsearch中,文档归属于一种类型(type),而这些类型存在于索引(index)中,我们可以画一些简单的对比图来类比传统关系型数据库:
Relational DB -> Databases -> Tables -> Rows -> Columns
Elasticsearch -> Indices -> Types -> Documents -> Fields
Elasticsearch集群可以包含多个索引(indices)(数据库),
每一个索引可以包含多个类型(types)(表),
每一个类型包含多个文档(documents)(行), Json
然后每个文档包含多个字段(Fields)(列)。 Json中的一个属性
索引(index)这个词在Elasticsearch中有着不同的含义,一个索引(index)就像是传统关系数据库中的数据库,它是相关文档存储的地方,index的复数是indices 或indexes。
where when
- 在什么时候下该使用ES呢?
搜索、日志分析(ELK)等等
我们的业务场景:订单数据量庞大,采用分库分表做数据存储,根据openId作为shardingKey,满足前台所有查询场景(所有请求都带openId来查订单信息,粒度是到用户),但后台运营需要查看所有订单信息,粒度就不是单个用户了,而且会带各种维度的查询条件来查询,但订单数据落在了不同的库不同的表中,通过db遍历搜索然后分页肯定不太现实,这种场景ElasticSearch再合适不过了~
-
和Apache生态的Solr比较呢?
solr.png
elasticsearch与solr的比较:
总结:
1、当单纯的对已有数据进行搜索时,Solr更快。
2、当实时建立索引时, Solr会产生io阻塞,查询性能较差, Elasticsearch具有明显的优势。
3、随着数据量的增加,Solr的搜索效率会变得更低,而Elasticsearch却没有明显的变化。
4、Solr的架构不适合实时搜索的应用。
5、Solr 支持更多格式的数据,而 Elasticsearch 仅支持json文件格式
6、Solr 在传统的搜索应用中表现好于 Elasticsearch,但在处理实时搜索应用时效率明显低于 Elasticsearch
7、Solr 是传统搜索应用的有力解决方案,但 Elasticsearch 更适用于新兴的实时搜索应用
how
- ES迭代版本非常快,了解下ES API的技术栈
第一:学习《 [Elasticsearch权威指南]》
第二:用什么版本呢?
从1.7到2.X,初始化方式改了一遍,从2.X到5.X又变了,如今已经有6.X版本,最新目前已经到7.X了,但推荐使用5.X!
注意:2.x版本数据可以直接迁移到 5.x; 5.X版本的数据可以直接迁移到6.x; 但是2.x版本数据无法直接迁移到6.x
ES 2.x版本
优点:
- Java技术栈, spring-boot-starter-data-elasticsearch 支持in-memory方式启动,单元测试开箱即用
- 当前线上运行的主流版本, 比较稳定
缺点: - 版本较老,无法体验新功能,且性能不如5.x
- 后期升级数据迁移比较麻烦
- 周边工具版本比较混乱;Kinbana等工具的对应版本需要自己查
ES 5.x版本
优点
- 版本相对较新,性能较好官方宣称索引吞吐量提升在25%到80%之间,新的数据结构用于存储数值和地理位置字段,性能大幅提升;5.x版本搜索进行了重构,搜索聚合能力大幅提高
- 周边工具比较全,版本号比较友好。 ES官方在5.x时代统一了 ELK体系的版本号
- 升级到6.x也比较方便
缺点: - 官方宣布已不支持In-Memory模式和Node Client已失效, 如果需要使用in-memory方式单测,需要自己手动配置ES版本、spring-data-elasticsearch版本、打开http访问开关等配置,并行使用REST API访问
第三:客户端如何使用呢?
Java技术栈目前有三种可以选择 Node Client, Transport Client, Rest API,
需要注明的是,官方已经标明NodeClient 已经过期,Transport Client 将在7.x版本开始不再支持,
最终会在7.x 统一到Rest API。目前Transport Client使用范围比较广;Rest API方式兼容性较好;除非在In-memory模式下运行单元测试,否则不推荐NodeClient。
本篇API使用还是采用Transport Client模式,
elasticsearch2.X调用方式:
public static Client getClient() throws UnknownHostException {
String clusterName = "elasticsearch";
List<String> clusterNodes = Arrays.asList("http://172.16.0.29:9300");
Settings settings = Settings.settingsBuilder().put("cluster.name", clusterName).build();
TransportClient client = TransportClient.builder().settings(settings).build();
for (String node : clusterNodes) {
URI host = URI.create(node);
client.addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName(host.getHost()), host.getPort()));
}
return client;
}
elasticsearch5.X调用方式:
public static Client getClient() throws UnknownHostException {
String clusterName = "shopmall-es";
List<String> clusterNodes = Arrays.asList("http://172.16.32.69:9300","http://172.16.32.48:9300");
Settings settings = Settings.builder().put("cluster.name", clusterName).build();
TransportClient client = new PreBuiltTransportClient(settings);
for (String node : clusterNodes) {
URI host = URI.create(node);
client.addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName(host.getHost()), host.getPort()));
}
return client;
- 撸代码,首先引入需要的包
<dependency>
<groupId>org.elasticsearch.client</groupId>
<artifactId>transport</artifactId>
<version>5.3.2</version>
</dependency>
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch</artifactId>
<version>5.3.2</version>
</dependency>
<dependency>
<groupId>com.google.code.gson</groupId>
<artifactId>gson</artifactId>
<version>2.8.2</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.12</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-api</artifactId>
<version>2.11.1</version>
</dependency>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-core</artifactId>
<version>2.11.1</version>
</dependency>
/**
* @Title:
* @Auther: hangyu
* @Date: 2019/4/23
* @Description
* @Version:1.0
*/
public class Book {
public static SimpleDateFormat simpleDateFormat = new SimpleDateFormat("yyyy-MM-dd");
private String id;
private String title;
private List<String> authors;
private String summary;
private String publish_date;
private Integer num_reviews;
private String publisher;
public Book(String id, String title, List<String> authors, String summary, String publish_date, Integer num_reviews, String publisher) {
this.id = id;
this.title = title;
this.authors = authors;
this.summary = summary;
this.publish_date = publish_date;
this.num_reviews = num_reviews;
this.publisher = publisher;
}
public static SimpleDateFormat getSimpleDateFormat() {
return simpleDateFormat;
}
public static void setSimpleDateFormat(SimpleDateFormat simpleDateFormat) {
Book.simpleDateFormat = simpleDateFormat;
}
public String getId() {
return id;
}
public void setId(String id) {
this.id = id;
}
public String getTitle() {
return title;
}
public void setTitle(String title) {
this.title = title;
}
public List<String> getAuthors() {
return authors;
}
public void setAuthors(List<String> authors) {
this.authors = authors;
}
public String getSummary() {
return summary;
}
public void setSummary(String summary) {
this.summary = summary;
}
public String getPublish_date() {
return publish_date;
}
public void setPublish_date(String publish_date) {
this.publish_date = publish_date;
}
public Integer getNum_reviews() {
return num_reviews;
}
public void setNum_reviews(Integer num_reviews) {
this.num_reviews = num_reviews;
}
public String getPublisher() {
return publisher;
}
public void setPublisher(String publisher) {
this.publisher = publisher;
}
}
/**
* @Title:
* @Auther: hangyu
* @Date: 2019/4/23
* @Description
* @Version:1.0
*/
public class DataUtil {
public static SimpleDateFormat dateFormater = new SimpleDateFormat("yyyy-MM-dd");
/**
* 模拟获取数据
*/
public static List<Book> batchData() {
List<Book> list = new LinkedList<>();
Book book1 = new Book("1", "Elasticsearch: The Definitive Guide", Arrays.asList("clinton gormley", "zachary tong"),
"A distibuted real-time search and analytics engine", "2015-02-07", 20, "oreilly");
Book book2 = new Book("2", "Taming Text: How to Find, Organize, and Manipulate It", Arrays.asList("grant ingersoll", "thomas morton", "drew farris"),
"organize text using approaches such as full-text search, proper name recognition, clustering, tagging, information extraction, and summarization",
"2013-01-24", 12, "manning");
Book book3 = new Book("3", "Elasticsearch in Action", Arrays.asList("radu gheorge", "matthew lee hinman", "roy russo"),
"build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms",
"2015-12-03", 18, "manning");
Book book4 = new Book("4", "Solr in Action", Arrays.asList("trey grainger", "timothy potter"), "Comprehensive guide to implementing a scalable search engine using Apache Solr",
"2014-04-05", 23, "manning");
list.add(book1);
list.add(book2);
list.add(book3);
list.add(book4);
return list;
}
public static Date parseDate(String dateStr) {
try {
return dateFormater.parse(dateStr);
} catch (ParseException e) {
}
return null;
}
}
/**
* @Title:
* @Auther: hangyu
* @Date: 2019/4/23
* @Description
* @Version:1.0
*/
public class Constants {
// 字段名
public static String ID = "id";
public static String TITLE = "title";
public static String AUTHORS = "authors";
public static String SUMMARY = "summary";
public static String PUBLISHDATE = "publish_date";
public static String PUBLISHER = "publisher";
public static String NUM_REVIEWS = "num_reviews";
// 过滤要返回的字段
public static String[] fetchFieldsTSPD = {ID, TITLE, SUMMARY, PUBLISHDATE};
public static String[] fetchFieldsTA = {ID, TITLE, AUTHORS};
// 高亮
public static HighlightBuilder highlightS = new HighlightBuilder().field(SUMMARY);
}
/**
* @Title:
* @Auther: hangyu
* @Date: 2019/4/24
* @Description
* @Version:1.0
*/
public class Response<T> {
private ResponseCode responseCode;
private T data;
public Response(ResponseCode responseCode, T data) {
this.responseCode = responseCode;
this.data = data;
}
public Response(ResponseCode responseCode) {
this.responseCode = responseCode;
}
}
/**
* @Title:
* @Auther: hangyu
* @Date: 2019/4/24
* @Description
* @Version:1.0
*/
public enum ResponseCode {
ESTIMEOUT(1, "超时"),
FAILEDSHARDS(2, "shard执行失败"),
OK(0, "成功");
private Integer code;
private String desc;
ResponseCode(Integer code, String desc) {
this.code = code;
this.desc = desc;
}
public Integer getCode() {
return code;
}
public void setCode(Integer code) {
this.code = code;
}
public String getDesc() {
return desc;
}
public void setDesc(String desc) {
this.desc = desc;
}
}
/**
* @Title:
* @Auther: hangyu
* @Date: 2019/4/23
* @Description
* @Version:1.0
*/
public class CommonQueryUtils {
public static Gson gson = new GsonBuilder().setDateFormat("YYYY-MM-dd").create();
/**
* 处理ES返回的数据,封装
*/
public static List<Book> parseResponse(SearchResponse searchResponse) {
List<Book> list = new LinkedList<>();
//可打印总记录数
System.out.println("parseResponse count is "+searchResponse.getHits().getTotalHits());
for (SearchHit hit : searchResponse.getHits().getHits()) {
// 用gson直接解析
Book book = gson.fromJson(hit.getSourceAsString(), Book.class);
list.add(book);
}
return list;
}
/**
* 解析完数据后,构建 Response 对象
*/
public static Response<List<Book>> buildResponse(SearchResponse searchResponse) {
// 超时处理
if (searchResponse.isTimedOut()) {
return new Response<>(ResponseCode.ESTIMEOUT);
}
// 处理ES返回的数据
List<Book> list = parseResponse(searchResponse);
// 有shard执行失败
if (searchResponse.getFailedShards() > 0) {
return new Response<>(ResponseCode.FAILEDSHARDS, list);
}
return new Response<>(ResponseCode.OK, list);
}
}
休息下~
- 关键逻辑开始了
/**
* @Title:
* @Auther: hangyu
* @Date: 2019/4/23
* @Description
* @Version:1.0
*/
public class EsConfig {
//http是9200 api访问用9300
private static String clusterNodes = "127.0.0.1:9300";
//集群名称必须事先配置在elasticsearch.yml中
private static String clusterName = "es-book-test";
public static Client client() {
Settings settings = Settings.builder().put("cluster.name", clusterName)
.put("client.transport.sniff", true).build();
TransportClient client = null;
try {
client = new PreBuiltTransportClient(settings);
if (clusterNodes != null && !"".equals(clusterNodes)) {
for (String node : clusterNodes.split(",")) {
String[] nodeInfo = node.split(":");
client.addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName(nodeInfo[0]), Integer.parseInt(nodeInfo[1])));
}
}
} catch (Exception e) {
System.out.println("e"+e);
}
return client;
}
}
/**
* @Title:
* @Auther: hangyu
* @Date: 2019/4/23
* @Description
* @Version:1.0
*/
public class DDLAndBulk {
private static String bookIndex = "book_index";
private static String bookIndexAlias = "book_index_alias";
private static String bookType = "book_type";
public static Gson gson = new GsonBuilder().setDateFormat("YYYY-MM-dd").create();
/**
* 创建索引,设置 settings,设置mappings
*/
public static void createIndex() {
int settingShards = 1;
int settingReplicas = 0;
Client client = EsConfig.client();
// 判断索引是否存在,存在则删除
IndicesExistsResponse indicesExistsResponse = client.admin().indices().prepareExists(bookIndex).get();
if (indicesExistsResponse.isExists()) {
System.out.println("索引 " + bookIndex + " 存在!");
// 删除索引,防止报异常 ResourceAlreadyExistsException[index [bookdb_index/yL05ZfXFQ4GjgOEM5x8tFQ] already exists
DeleteIndexResponse deleteResponse = client.admin().indices().prepareDelete(bookIndex).get();
if (deleteResponse.isAcknowledged()){
System.out.println("索引" + bookIndex + "已删除");
}else {
System.out.println("索引" + bookIndex + "删除失败");
}
} else {
System.out.println("索引 " + bookIndex + " 不存在!");
}
// 设置Settings,第一步新建index
CreateIndexResponse response = client.admin().indices().prepareCreate(bookIndex)
.setSettings(Settings.builder()
.put("index.number_of_shards", settingShards)
.put("index.number_of_replicas", settingReplicas))
.get();
// 查看结果
GetSettingsResponse getSettingsResponse = client.admin().indices()
.prepareGetSettings(bookIndex).get();
System.out.println("索引设置结果");
for (ObjectObjectCursor<String, Settings> cursor : getSettingsResponse.getIndexToSettings()) {
String index = cursor.key;
Settings settings = cursor.value;
Integer shards = settings.getAsInt("index.number_of_shards", null);
Integer replicas = settings.getAsInt("index.number_of_replicas", null);
System.out.println("index:" + index + ", shards:" + shards + ", replicas:" + replicas);
}
}
/**
* Bulk 批量插入数据
*/
public static void bulk() {
List<Book> list = DataUtil.batchData();
Client client = EsConfig.client();
BulkRequestBuilder bulkRequestBuilder = client.prepareBulk();
//第二步新建type和创建mapping 其实也可以忽略,如果不设置mapping,则es通过source中数据自动添加数据类型
if (!client.admin().indices().prepareTypesExists(bookIndex).setTypes(bookType).get().isExists()){
client.admin().indices().preparePutMapping(bookIndex).setType(bookType).setSource(readFileTOString("es-book-mapping.json")).get()
.isAcknowledged();
//第二步和第三步中间可以加一小步,可让之后mapping得到扩展,那就是创建索引别名
createAlias(bookIndex, bookIndexAlias);
}
// 添加index操作到 bulk 中
list.forEach(book -> {
// 第三步插入数据,ps:第三步可以包含第二步的新建type,并省略mapping构建,让数据自动由es识别出数据类型
// 新版的API中使用setSource时,参数的个数必须是偶数,否则需要加上 setSource(json, XContentType.JSON)
bulkRequestBuilder.add(client.prepareIndex(bookIndexAlias, bookType, book.getId()).setSource(gson.toJson(book), XContentType.JSON));
});
BulkResponse responses = bulkRequestBuilder.get();
if (responses.hasFailures()) {
// bulk有失败
for (BulkItemResponse res : responses) {
System.out.println(res.getFailure());
}
}
}
/**
* 创建别名
*/
private static boolean createAlias(String indexName, String indexAlias) {
Client client = EsConfig.client();
// 获取老的索引和别名对应关系
List<String> oldIndexName = new ArrayList<String>();
GetAliasesResponse getAliases = client.admin().indices().prepareGetAliases(indexAlias).get();
for (ObjectCursor<String> objectCursor : getAliases.getAliases().keys()) {
if (!indexName.equals(objectCursor.value)) {
oldIndexName.add(objectCursor.value);
}
}
// 添加新的别名
IndicesAliasesResponse r = client.admin().indices().prepareAliases().addAlias(indexName, indexAlias)
.execute().actionGet();
if (!r.isAcknowledged()) {
throw new RuntimeException("[ES Check] indexName:" + indexName + ", 创建别名失败:" + indexAlias);
}
if (oldIndexName.size() > 0) {
System.out.println("[ES Check] indexAlias:"+indexAlias+"获取到老的别名对应关系 oldIndexName:{}."+oldIndexName);
// 删除老关系
IndicesAliasesResponse r2 = client.admin().indices().prepareAliases()
.removeAlias(oldIndexName.toArray(new String[] {}), indexAlias).get();// .isAcknowledged();
if (!r2.isAcknowledged()) {
throw new RuntimeException("[ES Check] indexAlias:" + indexAlias + ", 删除老的别名对应关系失败:" + oldIndexName);
} else {
System.out.println("[ES Check] indexAlias:"+indexAlias+", 删除老的别名对应关系 oldIndexName:{}."+oldIndexName);
}
}
return true;
}
public static String readFileTOString(String name) {
InputStream inputStream = getResourceAsStream(name);
if (null == inputStream){
return null;
}
StringBuilder sb = new StringBuilder("");
BufferedReader reader = null;
try {
reader = new BufferedReader(new InputStreamReader(inputStream));
String tempString = null;
// 一次读入一行,直到读入null为文件结束
while ((tempString = reader.readLine()) != null) {
sb.append(tempString);
}
reader.close();
} catch (IOException e) {
e.printStackTrace();
} finally {
if (reader != null) {
try {
reader.close();
} catch (IOException e1) {
}
}
}
return sb.toString();
}
public static InputStream getResourceAsStream(String name) {
InputStream resourceStream = null;
// Try the current Thread context classloader
ClassLoader classLoader = Thread.currentThread().getContextClassLoader();
resourceStream = classLoader.getResourceAsStream(name);
if (resourceStream == null) {
// Finally, try the classloader for this class
classLoader = DDLAndBulk.class.getClassLoader();
resourceStream = classLoader.getResourceAsStream(name);
}
return resourceStream;
}
public static void main(String[] args) {
createIndex();
bulk();
}
}
{
"book_type": {
"properties": {
"id": {
"type": "long"
},
"title": {
"type": "string",
"index": "analyzed"
},
"authors": {
"type": "string",
"index": "not_analyzed"
},
"summary": {
"type": "string",
"index": "analyzed"
},
"publish_date": {
"type": "date",
"index": "not_analyzed"
},
"num_reviews": {
"type": "integer",
"index": "not_analyzed"
},
"publisher": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
/**
* @Title:
* @Auther: hangyu
* @Date: 2019/4/24
* @Description
* @Version:1.0
*/
public class BasicMatchQueryService {
private static Client client = EsConfig.client();
private static String bookIndexAlias = "book_index_alias";
private static String bookType = "book_type";
public static void main(String[] args) {
//multiBatch();
//match();
boolPage();
//boolPageMatch();
//fuzzy();
//wildcard();
//phrase();
//phrasePrefix();
}
/**
* 进行ES查询,执行请求前后打印出 查询语句 和 查询结果
*/
private static SearchResponse requestGet(String queryName, SearchRequestBuilder requestBuilder) {
System.out.println(queryName + " 构建的查询:" + requestBuilder.toString());
SearchResponse searchResponse = requestBuilder.get();
System.out.println(queryName + " 搜索结果:" + searchResponse.toString());
return searchResponse;
}
/**
* 1.1 对 "guide" 执行全文检索
* 测试:http://localhost:8080/basicmatch/multimatch?query=guide
*/
public static Response<List<Book>> multiBatch() {
MultiMatchQueryBuilder queryBuilder = new MultiMatchQueryBuilder("guide");
SearchRequestBuilder requestBuilder = client.prepareSearch(bookIndexAlias)
.setTypes(bookType).setQuery(queryBuilder);
SearchResponse searchResponse = requestGet("multiBatch", requestBuilder);
return CommonQueryUtils.buildResponse(searchResponse);
}
/**
* 1.2 指定特定字段检索
* 测试:http://localhost:8080/basicmatch/match?title=in action&from=0&size=4
*/
public static void match() {
MatchQueryBuilder matchQueryBuilder = new MatchQueryBuilder(Constants.TITLE, "in Action");
// 高亮
HighlightBuilder highlightBuilder = new HighlightBuilder().field(Constants.TITLE).fragmentSize(200);
SearchRequestBuilder requestBuilder = client.prepareSearch(bookIndexAlias)
.setTypes(bookType).setQuery(matchQueryBuilder)
.setFrom(0).setSize(4)
.highlighter(highlightBuilder)
// 设置 _source 要返回的字段
.setFetchSource(Constants.fetchFieldsTSPD, null);
SearchResponse searchResponse = requestGet("multiBatch", requestBuilder);
}
/**
* 精确匹配
* @return
*/
public static Response<List<Book>> boolPage() {
BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
RangeQueryBuilder rangeQueryBuilder = new RangeQueryBuilder(Constants.NUM_REVIEWS)
.gte(15).lte(50);
boolQueryBuilder.should().add(QueryBuilders.termQuery(Constants.PUBLISHER, "manning"));
boolQueryBuilder.should().add(QueryBuilders.termQuery(Constants.PUBLISHER, "oreilly"));
//term 精确匹配 range 范围匹配
//should表示或者关系(or) must表示并且(and) mustNot并且不是(and not)
boolQueryBuilder.mustNot(QueryBuilders.termQuery(Constants.AUTHORS, "radu gheorge")).filter().add(rangeQueryBuilder);
//boolQueryBuilder.must(rangeQueryBuilder).mustNot(QueryBuilders.termQuery(Constants.AUTHORS, "radu gheorge"));
SearchRequestBuilder requestBuilder = client.prepareSearch(bookIndexAlias).setTypes(bookType).setQuery(boolQueryBuilder)
.setFrom(0).setSize(10).addSort("id", SortOrder.DESC);
SearchResponse searchResponse = requestGet("bool", requestBuilder);
return CommonQueryUtils.buildResponse(searchResponse);
}
/**
* 全文匹配(针对text类型的字段进行全文检索)
* @return
*/
public static Response<List<Book>> boolPageMatch() {
BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
//matchQuery 分词匹配 matchPhraseQuery 短语匹配
boolQueryBuilder.must(QueryBuilders.matchQuery(Constants.SUMMARY,"engine using"))
.mustNot(QueryBuilders.matchPhraseQuery(Constants.SUMMARY, "analytics engine"));
SearchRequestBuilder requestBuilder = client.prepareSearch(bookIndexAlias).setTypes(bookType).setQuery(boolQueryBuilder)
.setFrom(0).setSize(10).addSort(SortBuilders.scoreSort());
SearchResponse searchResponse = requestGet("bool", requestBuilder);
return CommonQueryUtils.buildResponse(searchResponse);
}
/**
* 模糊检索(
* @return
*/
public static Response<List<Book>> fuzzy() {
MultiMatchQueryBuilder queryBuilder = new MultiMatchQueryBuilder("elasticseares")
.field("title").field("summary")
.fuzziness(Fuzziness.AUTO);
SearchRequestBuilder requestBuilder = client.prepareSearch(bookIndexAlias)
.setTypes(bookType).setQuery(queryBuilder)
.setFetchSource(Constants.fetchFieldsTSPD, null)
.setSize(2);
SearchResponse searchResponse = requestGet("fuzzy", requestBuilder);
return CommonQueryUtils.buildResponse(searchResponse);
}
/**
* 通配符检索、要查找具有以 "t" 字母开头的作者的所有记录
*/
public static Response<List<Book>> wildcard() {
WildcardQueryBuilder wildcardQueryBuilder = new WildcardQueryBuilder(Constants.AUTHORS, "t*");
HighlightBuilder highlightBuilder = new HighlightBuilder().field(Constants.AUTHORS, 200);
SearchRequestBuilder requestBuilder = client.prepareSearch(bookIndexAlias)
.setTypes(bookType).setQuery(wildcardQueryBuilder)
.setFetchSource(Constants.fetchFieldsTA, null)
.highlighter(highlightBuilder);
SearchResponse searchResponse = requestGet("wildcard", requestBuilder);
return CommonQueryUtils.buildResponse(searchResponse);
}
/**
* 正则表达式
* @return
*/
public static Response<List<Book>> regexp() {
String regexp = "t[a-z]*n";
RegexpQueryBuilder queryBuilder = new RegexpQueryBuilder(Constants.AUTHORS, regexp);
HighlightBuilder highlightBuilder = new HighlightBuilder().field(Constants.AUTHORS);
SearchRequestBuilder requestBuilder = client.prepareSearch(bookIndexAlias)
.setQuery(queryBuilder).setTypes(bookType).highlighter(highlightBuilder)
.setFetchSource(Constants.fetchFieldsTA, null);
SearchResponse searchResponse = requestGet("regexp", requestBuilder);
return CommonQueryUtils.buildResponse(searchResponse);
}
/**
* 短语匹配-必须在一个单元中同时包含这两个词,可以不相连,而不是分词包含其中之一
*
* "summary":"Comprehensive guide to implementing a scalable search engine using Apache Solr",
* "summary":"A distibuted real-time search and analytics engine",
* @return
*/
public static Response<List<Book>> phrase() {
MultiMatchQueryBuilder queryBuilder = new MultiMatchQueryBuilder("search engine")
.field(Constants.SUMMARY)
.type(MultiMatchQueryBuilder.Type.PHRASE).slop(3);
SearchRequestBuilder requestBuilder = client.prepareSearch(bookIndexAlias).setTypes(bookType)
.setQuery(queryBuilder)
.setFetchSource(Constants.fetchFieldsTSPD, null);
SearchResponse searchResponse = requestGet("phrase", requestBuilder);
return CommonQueryUtils.buildResponse(searchResponse);
}
/**
* 匹配词组前缀检索
* @return
*/
public static Response<List<Book>> phrasePrefix() {
MatchPhrasePrefixQueryBuilder queryBuilder = new MatchPhrasePrefixQueryBuilder(Constants.SUMMARY, "search en")
.slop(3).maxExpansions(10);
SearchRequestBuilder requestBuilder = client.prepareSearch(bookIndexAlias).setTypes(bookType)
.setQuery(queryBuilder).setFetchSource(Constants.fetchFieldsTSPD, null);
SearchResponse searchResponse = requestGet("phrasePrefix", requestBuilder);
return CommonQueryUtils.buildResponse(searchResponse);
}
}
之后的任务,研究下6.X甚至7.X的新特性,如何采用Rest Api去实现
- 接下来研究如何结合kibana的使用
- 如何结合logStash的使用
- ElasticSearch进阶
- 最后一个小知识点fuzzy
fuzzy搜索技术 --> 自动将拼写错误的搜索文本,进行纠正,纠正以后去尝试匹配索引中的数据
surprize --> 拼写错误 --> surprise --> s -> z
surprize --> surprise -> z -> s,纠正一个字母,就可以匹配上,所以在fuziness指定的2范围内
surprize --> surprised -> z -> s,末尾加个d,纠正了2次,也可以匹配上,在fuziness指定的2范围内
surprize --> surprising -> z -> s,去掉e,ing,3次,总共要5次,才可以匹配上,始终纠正不了
经过测试,fuzzy可以自动纠错两次~