Data Flow Transformations

2019-10-02 本文已影响0人鲸鱼酱375

1.Data Flow Transformations:

Transformations are used to apply business rules on data.
It can be used to aggregate, merge, distribute, and modify data as per the business requirement.
Transformations can also perform lookup operations and generate sample datasets.
It can also be used to perform data cleansing and data standardization.

it only one can execute in data flow task**

Transformation Types:

Transformations are classified as follows:

Row Transformations： row by row

Character Map
Copy Column
Data Conversion
Derived Column,
OLEDB Command, etc.

Split and Join Transformations

Conditional Split and Multicast non- blocking
Union All : semi bolcking
Merge and Merge Join: semi bolcking
Lookup : row by row,non-blocking
Cache Transform: non- blocking
clear the cache file

Rowset Transformations full blocking

Aggregate
Sort

Business Intelligence Transformations

Slowly Changing Dimension
Fuzzy Lookup and Fuzzy Grouping: full blocking
perform the (?)
identify the duplicte value(?)

Auditing Transformations

Audit unblocking
Row Count

Synchronous and Asynchronous Transformations:

SSIS dataflow contain 2 types of components:

Synchronous:

Non-Blocking Transformations
The output of an synchronous component uses the same buffer as the input.
Number of records IN == Number of records OUT

◌row by row processing
◌ doesn't create new buffer

◆when you have pipline, you have buffer(by default 10 mb)

buffer and thread

if max buffer rows =1000;maxbuffer=10mb, 超过了其中一个都不会插入
buffer size min=64mb，max=100mb

execution tree

=execution plan is sql
when a new buffer created, until a new buffer was wxecuted.

how to check execution tree?

view-other windows log- event
ssis- logging- details -add

Asynchronous:

Semi-Blocking and Full-Blocking Transformations
The output of an asynchronous component uses the new buffer.
An asynchronous component can have more or less output records then input records
we need create a new buffer, input =/= outpout record

◆ in transformation, we might need wait whole record; or wait partial record

semi-blocking vs full-blocking

semi	full
read a portion source and apply trans and send to destination	2 execution tree
may or may not require more buffer	must create new buffer
N in =/= N out	only execute transflormation when all record were read

Synchronous Vs Asynchronous:

Differences between Non-Blocking, Semi-Blocking, Full-Blocking transformations are listed below:

image.png

thread（？）: when you creating a new buffer, it required a new engine thread

Non-Blocking, Semi-Blocking and Full-Blocking Transformations are listed below:

image.png

Data Conversion:

Converts the data in an input column to a different data type and then copies it to a new output column.

It can be used to:

Change the data type
Set the column length of string data and the precision and scale on numeric data.
Specify a code page. ( configure language: eg: chinese; english)
we need change everytime, because in hard drive
This transformation has one input, one output, and one error output.

image.png

Character Map:

Applies string functions, such as conversion from lowercase to uppercase, to character data.
Operates only on column with a string data type.
Character Map transformation can convert column data in place or add converted data in the new column.
This transformation has one input, one output, and one error output.

You configure the Character Map transformation in the following ways:

Specify the columns to convert.
Specify the operations to apply to each column.

Copy Column:

Copy Column transformation creates new columns by copying input columns and adding the new columns to the transformation output
It can create multiple copies of a column, or create copies of multiple columns in one operation.
This transformation has one input, one output. It does not support an error output.

Derived Column:

Creates new column values by applying expressions to transformation input columns.
Expression can have variables, functions, operators, and columns from the transformation input.

It can be used to:
●Concatenate data from different columns
●Extract characters from string data by using functions
●Apply mathematical functions to numeric data
●Create expressions that compare input columns and variables
●Extract parts of a datetime value
The result can be added as a new column or inserted into an existing column
This transformation has one input, one regular output, and one error output.

▶always use when we want to calculate