14 ElasticSearch For Hadoop 源代码问

2020-06-23  本文已影响0人  逸章

第一章问题

1. 问题

1、hadoop-hdfs这个artifactId需要修改

    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-hdfs</artifactId>
        <version>3.2.1</version>
    </dependency>

2。下面的一个错误which might be less than configured maximum allocation=<memory:8192, vCores:4>

3。job卡死
3。1 分配给Yarn的资源不够(我的就是这个原因)
在yarn-site.xml中配置下面的参数即可

<property>
    <name>yarn.nodemanager.resource.memory-mb</name>
    <value>20480</value>
</property>
<property>
   <name>yarn.scheduler.minimum-allocation-mb</name>
   <value>2048</value>
</property>
<property>
    <name>yarn.nodemanager.vmem-pmem-ratio</name>
    <value>2.1</value>
</property>

3。2 原因可能二(网上说的,和我们这里无关):主要原因是执行过程中我们执行了Ctrl+c,出现的现象是:

yay@yay-ThinkPad-T470-W10DG:~/下载/Elasticsearch for Hadoop_code/eshadoop-master/Chapter1/target$ hadoop jar ch01-0.0.1-job.jar /input/ch01/sample.txt
2020-06-23 19:58:08,230 INFO client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:8032
2020-06-23 19:58:08,414 WARN mr.EsOutputFormat: Speculative execution enabled for reducer - consider disabling it to prevent data corruption
2020-06-23 19:58:08,768 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
2020-06-23 19:58:08,845 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/yay/.staging/job_1592904887052_0004
2020-06-23 19:58:09,052 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2020-06-23 19:58:10,658 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2020-06-23 19:58:12,441 INFO input.FileInputFormat: Total input files to process : 1
2020-06-23 19:58:13,754 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2020-06-23 19:58:14,284 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2020-06-23 19:58:14,324 INFO mapreduce.JobSubmitter: number of splits:1
2020-06-23 19:58:14,520 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2020-06-23 19:58:14,549 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1592904887052_0004
2020-06-23 19:58:14,549 INFO mapreduce.JobSubmitter: Executing with tokens: []
2020-06-23 19:58:14,732 INFO conf.Configuration: resource-types.xml not found
2020-06-23 19:58:14,732 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2020-06-23 19:58:14,794 INFO impl.YarnClientImpl: Submitted application application_1592904887052_0004
2020-06-23 19:58:14,832 INFO mapreduce.Job: The url to track the job: http://yay-ThinkPad-T470-W10DG:8088/proxy/application_1592904887052_0004/
2020-06-23 19:58:14,832 INFO mapreduce.Job: Running job: job_1592904887052_0004

然后没有动静
重启yarn和hdfs后再次执行终于看到错误信息:

yay@yay-ThinkPad-T470-W10DG:~/下载/Elasticsearch for Hadoop_code/eshadoop-master/Chapter1/target$ hadoop jar ch01-0.0.1-job.jar /input/ch01/sample.txt
2020-06-23 20:02:30,428 INFO client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:8032
2020-06-23 20:02:30,632 WARN mr.EsOutputFormat: Speculative execution enabled for reducer - consider disabling it to prevent data corruption
2020-06-23 20:02:31,036 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
2020-06-23 20:02:31,057 INFO mapreduce.JobSubmitter: Cleaning up the staging area /tmp/hadoop-yarn/staging/yay/.staging/job_1592913738368_0001
Exception in thread "main" org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot delete /tmp/hadoop-yarn/staging/yay/.staging/job_1592913738368_0001. Name node is in safe mode.
The reported blocks 21 has reached the threshold 0.9990 of total blocks 21. The minimum number of live datanodes is not required. In safe mode extension. Safe mode will be turned off automatically in 3 seconds. NamenodeHostName:localhost
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.newSafemodeException(FSNamesystem.java:1476)

解决方法是:

yay@yay-ThinkPad-T470-W10DG:~/下载/Elasticsearch for Hadoop_code/eshadoop-masterhapter1/target$ hadoop dfsadmin -safemode leave
WARNING: Use of this script to execute dfsadmin is deprecated.
WARNING: Attempting to execute replacement "hdfs dfsadmin" instead.

yay@yay-ThinkPad-T470-W10DG:~/下载/Elasticsearch for Hadoop_code/eshadoop-master/Chapter1/target$ 

<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>1024</value>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>1</value>
</property>
改为
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>1536</value>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>4</value>
</property>
3。我们这里需要使用hadoo2.6.0

2. 其他注释

图片.png 图片.png

第二章问题

1.一个错误(在中国地区运行会出错)

图片.png

2. 其他注释

2.1 ES中字段是如何形成的

图片.png

2.2 体现ES统计功能的例子(Top n的使用)

图片.png

TOP5的查询条件:

post esh_network/_search?pretty -d `{
  "aggs":{
    "top-catagories":{
      "terms":
      {
       "field":"category",
        "size":5
      }
    }
  },
  "size":0
}`
效果图: 图片.png

2.3 如果数据量过大,可以以天为粒度建立索引

注意:执行过程可能有报错,但是多个索引的确都建立了

图片.png
public class Driver {

    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        // ElasticSearch Server nodes to point to
        conf.set("es.nodes", "localhost:9200");
        // ElasticSearch index and type name in {indexName}/{typeName} format
//      conf.set("es.resource", "esh_network/network_logs_{action}");
        conf.set("es.resource", "esh_network_{@timestamp:YYYY.MM.dd}/network_logs_{action}");
        //EEE MMM dd hh:mm:ss yyyy

        // Create Job instance
        Job job = new Job(conf, "network monitor mapper");
        // set Driver class
        job.setJarByClass(Driver.class);
        job.setMapperClass(NetworkLogsMapper.class);
        // set OutputFormat to EsOutputFormat provided by ElasticSearch-Hadoop jar
        job.setOutputFormatClass(EsOutputFormat.class);
        job.setNumReduceTasks(0);
        FileInputFormat.addInputPath(job, new Path(args[0]));

        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }

}
用模糊匹配删除索引: 图片.png

2.4 Tweeter数据入ES然后入HDFS
2.4.1. 数据入ES

yay@yay-ThinkPad-T470-W10DG:~/下载/Elasticsearch for Hadoop_code/eshadoop-masterhapter2$ hdfs dfs -put data/tweets.csv /input/ch02/tweets.csv
yay@yay-ThinkPad-T470-W10DG:~/下载/Elasticsearch for Hadoop_code/eshadoop-masterhapter2$ hdfs dfs -ls /input/ch02
Found 2 items
-rw-r--r--   1 yay supergroup    7330547 2020-06-27 00:20 /input/ch02/network-logs.txt
-rw-r--r--   1 yay supergroup     391727 2020-06-30 16:53 /input/ch02/tweets.csv
yay@yay-ThinkPad-T470-W10DG:~/下载/Elasticsearch for Hadoop_code/eshadoop-masterhapter2$ hadoop jar target/ch02-0.0.1-tweets2es-job.jar /input/ch02/tweets.csv
图片.png
2.4.2 从ES读书数据到HDFS上
注意,HDFS上的输出目录需要确保不存在,比如我实际使用的目录是/input/ch02/tohdfs,否则你就需要考虑先删除已经存在的目录:$ hdfs dfs -rm -r /input/ch03/
图片.png
yay@yay-ThinkPad-T470-W10DG:~/下载/Elasticsearch for Hadoop_code/eshadoop-master/Chapter2$ hadoop jar target/ch02-0.0.1-tweets2hdfs-job.jar /input/ch02/tohdfs
图片.png 图片.png

第三章问题

2、其他

curl命令换行使用单引号

yay@yay-ThinkPad-T470-W10DG:~/下载/Elasticsearch for Hadoop_code/eshadoop-master/Chapter2$ curl -XPUT http://localhost:9200/hrms1/candidate/1?pretty -d '
{"firstname":"ay",
"lastname":"Y",
"skill":["Java","Scala","what"]
}'
{
  "_index" : "hrms1",
  "_type" : "candidate",
  "_id" : "1",
  "_version" : 1,
  "created" : true
}
yay@yay-ThinkPad-T470-W10DG:~/下载/Elasticsearch for Hadoop_code/eshadoop-master/Chapter2$ curl -XPOST http://localhost:9200/hrms1/candidate/1/_update?pretty -d '
{"doc":{"newkey":"newvalue"}}'
{
  "_index" : "hrms1",
  "_type" : "candidate",
  "_id" : "1",
  "_version" : 2
}
yay@yay-ThinkPad-T470-W10DG:~/下载/Elasticsearch for Hadoop_code/eshadoop-master/Chapter2$ curl -XGET http://localhost:9200/hrms1/candidate/1?pretty
{
  "_index" : "hrms1",
  "_type" : "candidate",
  "_id" : "1",
  "_version" : 2,
  "found" : true,
  "_source":{"firstname":"ay","lastname":"Y","skill":["Java","Scala","what"],"newkey":"newvalue"}
}
yay@yay-ThinkPad-T470-W10DG:~/下载/Elasticsearch for Hadoop_code/eshadoop-master/Chapter2$ 
图片.png

2.1 URI查询

图片.png
curl  http://localhost:9200/hrms/candidate/_search?pretty=true&q=skills:elasticsearch

2.2 match_all查询

图片.png
yay@yay-ThinkPad-T470-W10DG:~$ curl -XPOST  http://localhost:9200/hrms/candidate/_search?pretty -d '
{
   "query":
   {
    "match_all":{}
   }
}'

2.3 term匹配

yay@yay-ThinkPad-T470-W10DG:~$ curl -XPOST  http://localhost:9200/hrms/candidate/_search?pretty -d '
{
   "query":
 {
   "term":
   {
     "skills":
     {
       "value":"elasticsearch"
     }
   }
 },

"size":"2"
}'

图片.png

2.4 boolean查询

yay@yay-ThinkPad-T470-W10DG:~$ curl -XPOST  http://localhost:9200/hrms/candidate/_search?pretty -d '
{
   "query":
 {
   "bool":{
    "must":[{
      "term":{
        "address.city":{
            "value":"Mumbai"
     
             }
            }
           }
         ],
   "should":[{
      "terms":{
        "skills":["elasticsearch","lucene"]
              }
            }]
      }
 }
}'
图片.png

2.5 match查询

yay@yay-ThinkPad-T470-W10DG:~$ curl -XPOST  http://localhost:9200/hrms/candidate/_search?pretty -d '
{
   "query":{
   "match":{
      "comments":{
        "query":"hacking java"
              }
          }
       }
}'
图片.png

如果加上type,可以限制精确匹配:

yay@yay-ThinkPad-T470-W10DG:~$ curl -XPOST  http://localhost:9200/hrms/candidate/_search?pretty -d '
{
   "query":{
   "match":{
      "comments":{
        "query":"Ethical hacking","type":"phrase"
              }
          }
       }
}'


图片.png

2.6 range

yay@yay-ThinkPad-T470-W10DG:~$ curl -XPOST  http://localhost:9200/hrms/candidate/_search?pretty -d '
{
   "query":{
     "range":{
     "experience":{
       "gte":5,
       "lte":10}
          }
       }
}'
图片.png

2.7 wildcard查询(针对精确查询的通配)

yay@yay-ThinkPad-T470-W10DG:~$ curl -XPOST  http://localhost:9200/hrms/candidate/_search?pretty -d '
{
   "query":{
     "wildcard":{
       "address.city":{
         "value":"Mu*"
           }
        }
   }
}'

图片.png

3 过滤器

和query的区别是没有相关性概念,过滤器不会按照记分的方法返回相似结果


图片.png

3.1 exists(字段存在且非空)

注意下面filtered是旧版本用法,新版本已经废弃

yay@yay-ThinkPad-T470-W10DG:~$ curl -XPOST  http://localhost:9200/hrms/candidate/_search?pretty -d '
{
   "query":{
     "filtered":{
    "filter":{
      "exists":{
        "field":"achievements"
              }
      }
    }
  }
}'
图片.png

第四章问题

1. 问题

2.其他

**1. **


图片.png
 private static Writable convertMMddYYTimeToWritable(String timeStr)
    {
        if(timeStr == null){
            return NullWritable.get();
        }
        SimpleDateFormat dateFormat=new SimpleDateFormat("MM/dd/yyyy", Locale.ENGLISH);
        try {
            return new LongWritable(dateFormat.parse(timeStr).getTime());
        } catch (ParseException e) {
            e.printStackTrace();
            return NullWritable.get();
        }        
    }
图片.png
图片.png 图片.png 图片.png

**2. 堆积柱状图

图片.png
图片.png 图片.png
图片.png 图片.png 图片.png
图片.png

面积图: 图片.png

环形图: 图片.png
图片.png
上一篇下一篇

猜你喜欢

热点阅读