Data Flow Transformations
1.Data Flow Transformations:
- Transformations are used to apply business rules on data.
- It can be used to aggregate, merge, distribute, and modify data as per the business requirement.
- Transformations can also perform lookup operations and generate sample datasets.
- It can also be used to perform data cleansing and data standardization.
- it only one can execute in data flow task**
Transformation Types:
Transformations are classified as follows:
Row Transformations: row by row
- Character Map
- Copy Column
- Data Conversion
- Derived Column,
- OLEDB Command, etc.
Split and Join Transformations
- Conditional Split and Multicast non- blocking
- Union All : semi bolcking
- Merge and Merge Join: semi bolcking
- Lookup : row by row,non-blocking
- Cache Transform: non- blocking
clear the cache file
Rowset Transformations full blocking
- Aggregate
- Sort
Business Intelligence Transformations
- Slowly Changing Dimension
- Fuzzy Lookup and Fuzzy Grouping: full blocking
perform the (?)
identify the duplicte value(?)
Auditing Transformations
- Audit unblocking
- Row Count
Synchronous and Asynchronous Transformations:
SSIS dataflow contain 2 types of components:
Synchronous:
- Non-Blocking Transformations
- The output of an synchronous component uses the same buffer as the input.
- Number of records IN == Number of records OUT
◌row by row processing
◌ doesn't create new buffer
◆when you have pipline, you have buffer(by default 10 mb)
buffer and thread
- if max buffer rows =1000;maxbuffer=10mb, 超过了其中一个都不会插入
- buffer size min=64mb,max=100mb
execution tree
- =execution plan is sql
- when a new buffer created, until a new buffer was wxecuted.
how to check execution tree?
- view-other windows log- event
- ssis- logging- details -add
Asynchronous:
- Semi-Blocking and Full-Blocking Transformations
- The output of an asynchronous component uses the new buffer.
- An asynchronous component can have more or less output records then input records
- we need create a new buffer, input =/= outpout record
◆ in transformation, we might need wait whole record; or wait partial record
semi-blocking vs full-blocking
semi | full |
---|---|
read a portion source and apply trans and send to destination | 2 execution tree |
may or may not require more buffer | must create new buffer |
N in =/= N out | only execute transflormation when all record were read |
Synchronous Vs Asynchronous:
- Differences between Non-Blocking, Semi-Blocking, Full-Blocking transformations are listed below:
thread(?): when you creating a new buffer, it required a new engine thread
-
Non-Blocking, Semi-Blocking and Full-Blocking Transformations are listed below:
image.png
Data Conversion:
- Converts the data in an input column to a different data type and then copies it to a new output column.
It can be used to:
-
Change the data type
-
Set the column length of string data and the precision and scale on numeric data.
-
Specify a code page. ( configure language: eg: chinese; english)
we need change everytime, because in hard drive -
This transformation has one input, one output, and one error output.
Character Map:
- Applies string functions, such as conversion from lowercase to uppercase, to character data.
- Operates only on column with a string data type.
- Character Map transformation can convert column data in place or add converted data in the new column.
- This transformation has one input, one output, and one error output.
You configure the Character Map transformation in the following ways:
- Specify the columns to convert.
- Specify the operations to apply to each column.
Copy Column:
- Copy Column transformation creates new columns by copying input columns and adding the new columns to the transformation output
- It can create multiple copies of a column, or create copies of multiple columns in one operation.
- This transformation has one input, one output. It does not support an error output.
Derived Column:
- Creates new column values by applying expressions to transformation input columns.
- Expression can have variables, functions, operators, and columns from the transformation input.
-
It can be used to:
●Concatenate data from different columns
●Extract characters from string data by using functions
●Apply mathematical functions to numeric data
●Create expressions that compare input columns and variables
●Extract parts of a datetime value -
The result can be added as a new column or inserted into an existing column
-
This transformation has one input, one regular output, and one error output.
▶always use when we want to calculate