sql审计-tez

2020-06-28  本文已影响0人  wangliang938

抄自:https://www.qubole.com/blog/scaling-tez-application-using-application-timeline-server-v1-5/

通过这篇文章可以明白 ats 1.0的弊端, 在hadoop 2.*的环境下建议使用ats 1.5.

Introduction

In an earlier blog post, we presented a secure, multi-tenant, reliable, and scalable service that provides access to logs and history for MRv2 applications. In this article, we will cover the challenges faced in scaling the aforementioned service for Tez applications and some approaches to overcoming those challenges.

Application Timeline Server

Application Timeline Server (ATS) was introduced in Apache Hadoop as a single generic service in YARN to store, manage, and serve per-framework application timeline data for both running and finished applications. ATS provides a REST client library that the Application Master (AM) can use to publish Application Timeline data.

The Tez framework is integrated with ATS and provides a UI which displays views of the Tez Application(s). Using ATS as the backend, the Tez UI displays diagnostics information about the applications, directed acyclic graphs (DAG), vertices, and tasks. This information includes container logs, counters, configuration, start/stop time and other data.

Ephemeral Clusters And ATS

Qubole Data Service (QDS) allows users to configure their clusters to autoscale based on the workload, and shut down automatically when there is a period of inactivity. This results in substantial cost savings. ATS and the Tez UI server running on cluster master are responsible for storing and serving timeline data for applications running on that cluster. For debugging an application that had been running on a scaled-down cluster, users need to access its log, configuration, counters, etc. through the Tez UI. As ATS stores this data in the local file system, cluster termination results in data loss and makes it challenging to access Timeline Data once the cluster is terminated.

To remedy this, Qubole has provisioned a multi-tenant version of ATS and the Tez-UI server in the QDS Control Plane (henceforth called as Offline ATS and Offline Tez UI respectively). The online Tez UI is used only when the cluster is running. Otherwise, the Offline Tez UI serves the request. Both these services let users access application logs and other data through a standard workflow, irrespective of the current state of the cluster.

Architecture With ATS V1

The first iteration of ATS, v1, uses LevelDB on top of the local file system to store Timeline Data. The architecture for Offline Tez UI with ATS v1 is similar to the Job History Server (JHS) described in the previous blog.

image

The diagram above illustrates the process. When a Tez application is running, the Application Master sends Timeline Data to ATS (U1). This Timeline Data is stored in the LevelDB Timeline Store that is shared across all applications running in the cluster. Once the application is finished, the AM sends a FINISH event to the Resource Manager (RM) (U2), and the RM fetches data from ATS for this application through the ATS REST APIs (U3). Once the data is fetched, RM creates a local LevelDB Timeline Store on this data, and the generated LevelDB files are then backed up to a Persistent Cloud Storage (U4).

Application Timeline data in the form of the LevelDB store can now be accessed even if the cluster is down. When Offline Tez UI is requested to view an Application (D1), Offline ATS consumes LevelDB files stored in the cloud (D3).

Shortcomings Of ATS V1

Being a single point of contact for all Timeline Data related requests, ATS often suffers from performance, reliability and scalability issues:

Apache Hadoop introduced ATS v1.5 to overcome the aforementioned shortcomings.

ATS V1.5 Architecture

As mentioned, the LevelDB-based storage solution was unable to scale ATS, which caused delay and degraded performance. To overcome this, ATS v1.5 introduced a distributed architecture wherein the AM stores timeline data on a highly distributed and reliable Hadoop File System (HDFS). In ATSv1.5, the AM and ATS are decoupled. The AM writes data directly into HDFS, and ATS lazily reads data from HDFS on-demand. This ensures the AM is no longer a bottleneck when writing Timeline Events and allows better scaling.

Timeline Data is stored in two parts:

Offline Tez UI With ATS V1.5

The AM publishes timeline events by appending data to files stored in HDFS. Therefore, using a File System that supports the append operation is recommended for better performance. Due to this limitation, however, the AM cannot be configured to store data in a cloud-based Persistent FileSystem like Amazon S3.

image

The architecture for Offline Tez UI with ATS v1.5 (shown above) is similar to ATS v1 (shown earlier) with the following differences:

This change in storage architecture helped in making Offline Tez UI more efficient and reliable by limiting the amount of data fetched from ATS. For example, when the Application page is opened, only Summary Data(D3)is downloaded, whereas when a DAG specific page is opened, Tez UI fetches only the DAG’s Timeline Data from ATS (D4).

Advantages Of ATS V1.5

Using ATS v1.5 provides the following advantages:

Offline TezUI: ATSv1 Vs ATSv1.5

By consuming ATS v1.5 as the backend service for Tez UI, several improvements were achieved over ATS v1.

image

ATS v1.5 is available for our customers using Hadoop cluster with Hive 2.1 and 2.3. No action required on the customer’s side as this is enabled by default. ATSv1.5 will be supported in Hive 3.1 in a future release.

上一篇 下一篇

猜你喜欢

热点阅读