SQL On Big Data

2017-03-09 本文已影响41人 perryn

The query engine parses the query, verifies the semantics of the query against the
catalog store, and generates the logical query plan that is then optimized by the query
optimizer to generate the optimal physical execution plan, based on CPU, I/O, and
network costs to execute the query. The final query plan is then used by the storage
engine to retrieve the underlying data from various data structures, either on disk or in
memory. It is in the storage engine that processes such as Lock Manager, Index Manager,
Buffer Manager, etc., get to work to fetch/update the data, as requested by the query.
Most RDBMS are SMP-based architectures. Traditional database platforms operate
by reading data off disk, bringing it across an I/O interconnect, and loading it into
memory for further processing. An SMP-based system consists of multiple processors,
each with its own memory cache. Memory and I/O subsystems are shared by each of the
processors. The SMP architecture is limited in its ability to move large amounts of data, as
required in data warehousing and large-scale data processing workloads.
The major drawback of this architecture is that it moves data across backplanes and
I/O channels. This does not scale and perform when large data sets are to be queried,
and more so when the queries involve complex joins that require multiple phases of
processing. A huge inefficiency lies in delivering large data off disk across the network
and into memory for processing by the DataBase Management System (DBMS).

SQL On Big Data

猜你喜欢

热点阅读