大数据

006 Hadoop 分析工具 Top 5 - 深入了解高级分析

2019-08-03  本文已影响0人  胡巴Lei特

006 Top 5 Hadoop Analytics Tools – Take a Dive into Advanced Analytics

Hadoop is an open source distributed storage and processing framework. It is at the center of the growing big data ecosystem. It gets used for advanced analytics which includes predictive analytics, data mining and machine learning. Hadoop is a technology which manages data processing and storage for big data applications. And can work with various forms of structured and unstructured data. So, let’s explore Hadoop Analytics Tools.

Hadoop 是一个开源的分布式存储和处理框架. 它是不断增长的大数据生态系统的中心.它被用于高级分析,包括预测分析、数据挖掘和 机器学习. Hadoop 是为大数据应用管理数据处理和存储的技术. 可以处理各种形式的结构化和非结构化数据. 让我们来探索一下 Hadoop 分析工具.

Hadoop Analytics Tools

List of Top Hadoop Analytics Tools

Hadoop 分析工具列表

Below are the top 5 Hadoop Analytics Tools, let’s discuss them in detail –

下面是前 5 大 Hadoop 分析工具,让我们详细讨论一下 --

1. Spark

Apache Spark provides in-memory data processing for developers and data scientists. Its easy development, flexibility, and speed have made it one of the popular Apache projects. It is the successor to MapReduce as a standard execution engine for Hadoop. Apache spark enables real-time, batch and advanced analytics over Hadoop platform. Spark is increasingly becoming the default data execution engine for analytics workload.

Apache Spark 为开发人员和数据科学家提供内存数据处理. 它的简单开发、灵活性和速度使得它成为流行的 Apache 项目之一. 它是 MapReduce 作为 Hadoop 标准执行引擎的继任者. Apache spark 支持实时、批处理和高级分析 Hadoop 平台. Spark 正日益成为分析工作负载的默认数据执行引擎.

Features of Spark:

Spark 的特点:

2. Apache Impala

Apache Impala provides massively parallel processing SQL analytics. It opens up interactive BI for the business analyst. Apache Impala is great at performance and concurrency requirements. These are features which are necessary for building an analytic database. It is natively integrated with Hadoop and leading BI tools to provide with a low-cost platform for analytics.

提供大规模并行处理 SQL 分析.它打开了面向业务分析师的交互式 BI.Apache Impala 在性能和并发需求方面非常出色.这些是构建分析数据库所必需的特性.它与 Hadoop 和 领先的 BI 工具提供一个低成本的分析平台.

Features of Impala:

3. MapReduce

Hadoop MapReduce is a framework for writing applications to process a huge amount of data. They do so in parallel on a large cluster of commodity hardware in a reliable and fault tolerant manner. The job submitted by the client gets divided into a number of independent tasks. These tasks run in parallel giving high throughput. The Map-reduce job is majorly divided into Map tasks and reduce tasks. Usually, programmers write the entire business logic in the map task. And reduce task perform summarization on the input dataset.

Hadoop MapReduce是一个编写应用程序以处理大量数据的框架.它们以可靠和容错的方式在大型商品硬件集群上并行执行.客户提交的作业被分成若干独立的任务.这些任务并行运行,吞吐量高.的地图缩小作业主要被分为地图任务和减少任务.通常,程序员在 map 任务中编写整个业务逻辑.并减少任务对输入数据集执行摘要.

Features of Hadoop MapReduce:

4. Mahout

Apache Mahout is a library of various scalable machine learning algorithms. It gets implemented on the top of Hadoop using Map-Reduce paradigm. Machine learning is the discipline of Artificial Intelligence. It is focused on enabling machines to learn without being explicitly programmed. It is commonly used to improve performance in the future based on previous outcomes.

Apache Mahout 是各种可扩展的库机器学习算法.它使用 Map-Reduce 范式在 Hadoop 的顶部实现.机器学习的学科是人工智能.它的重点是让机器在没有明确编程的情况下学习.它通常用于基于以前的结果来提高未来的绩效.

Features of Mahout:

5. Apache Hive

Apache Hive is a data warehouse software. It facilitates querying and managing large datasets residing in distributed storage. Hive provides a mechanism to query data using SQL like language i.e. HQL. At the same time, this language allows map-reduce programmers to plug in their source code i.e. mappers and reducers. This is when it is inconvenient or inefficient to express the logic in HiveQL.

数据仓库软件 Apache Hive.它便于查询和管理分布存储中的大型数据集.Hive 提供了一种机制使用类似 SQL 的语言查询数据 即 HQL.与此同时,这种语言允许 map-reduce 程序员插入他们的源代码,即 mappers 和 reducers.这是在 HiveQL 中表达逻辑不方便或效率低下的时候.

Features:

So, this was all about Hadoop Analytics Tools. Hope you liked our explanation.

所以,这都是关于 Hadoop 分析工具的.希望你喜欢我们的解释

Summary

Hadoop is great for MapReduce data analysis on the huge amount of data. Its specific use cases include data searching, data reporting, large scale indexing of files i.e. log files or data from web crawlers. Mahout lets you analyze large sets of data effectively in less time. Impala makes processing data easy. It gives response quickly in real-time. The performance of the repeated query is even better. Apache Spark gives high performance for both batch and streaming data. It uses state-of-the-art DAG scheduler, query optimizer, and physical execution engine.

Hadoop 对于海量数据的 MapReduce 数据分析非常有用.它的具体使用案例包括数据搜索、数据报告、文件的大规模索引,即日志文件或来自网络爬虫的数据.Mahout 让您能够在更短的时间内有效地分析大量数据.Impala 使处理数据变得简单.它可以实时快速响应.重复查询的性能更好.Apache Spark 为批处理和流数据提供了高性能.它使用最先进的 DAG 调度器、查询优化器和物理执行引擎.

Still, if you have any query regarding Hadoop Analytics Tools, ask in the comment tab.

尽管如此,如果您对 Hadoop 分析工具有任何疑问,请在 “评论” 选项卡中询问.

https://data-flair.training/blogs/hadoop-analytics-tools

上一篇 下一篇

猜你喜欢

热点阅读