MSBI - SSIS

2020-12-09  本文已影响0人  Kevin不会创作

Lab 1

  1. Definition of BI

  2. BI Life Cycle

    ETL BI Life Cycle
  3. SSIS, SSAS, SSRS

  4. Loading CSV in SQL Server

  5. How to see errors

    Use Progress.

  6. Data Flow vs Control Flow

    Control Flow contains Data Flow.

  7. Structure of SSIS project

    Structure

Lab 2

  1. Conditional Split

  2. Data Conversion

  3. Handle errors

    • Ignore failure
    • Redirect row
    • Fail component

Lab 3

  1. For loop

    In Data flow we do ETL activities while in Control flow we do Non-ETL activities. For loop is a Non-ETL activity and has to be done in Control flow.

    • For Loop Container loops till a fixed count.
    • Foreach Loop Container loops through items like recordset, files and so on.

    a. Add Foreach Loop Container.
    b. Add a new variable named FullFilePath.
    c. Edit Foreach Loop Container and choose "Foreach File Enumerator".
    d. Select variable FullFilePath to map to the collection value.
    e. Use Expressions to send variable FullFilePath to ConnectionString in CSV Connection Properties.

  2. Variables

    Variables store values that a SSIS package and its containers, tasks, and event handlers can use at run time.

    Document: SSIS Variables

  3. Parameters

    Parameters allow you to assign values to properties within packages at the time of package execution. You can create project parameters at the project level and package parameters at the package level.

    Document: SSIS Parameters

  4. Debugging, Quick watch, add watch

    You can debug your task using Edit Breakpoints.

    You can observe the value for your Expression in QuickWatch. You can also add watch to view your Expression all the time.

Lab 4

  1. Deployment

    Deployment
    • Project Deployment vs Package Deployment

      Integration Services supports two deployment models, the project deployment model and the legacy package deployment model.

      Document: Project and Package Deployment

  2. File System Task

    File System Task can copy files from one folder to another folder.

  3. SQL Server Agent

    SQL Server Agent can schedule a task periodically.

Lab 5

  1. Measures and Dimensions

    There are two kinds of tables in SQL Server: Measure and Dimension. Measure (Fact) is the central table with Foreign Key relationship with Dimension.

  2. Star and Snowflake Schema

    In Star Schema Dimension tables are not connected.

    In Snowflake Schema Dimension tables can be connected.

    Document: Star and Snowflake Schema

  3. Execute Package Task

  4. Shared connection managers

Lab 6

  1. Slowly Changing Dimension (SCD)

    The Slowly Changing Dimension transformation coordinates the updating and inserting of records in data warehouse dimension tables.

    Fixed Attribute: The value in a column should not change. Changes are treated as errors.

    Changing Attribute: Changed values should overwrite existing values. This is a Type 1 change.

    Historical Attribute: changes in column values are saved in new records. Previous values are saved in records marked as outdated. This is a Type 2 change.

    Document: Slowly Changing Dimension

  2. Unicode and Non-Unicode

    A Unicode character takes more bytes to store the data in the database.

    Using non Unicode it is easy to store languages like ‘English’ but not other Asian languages that need more bits to store correctly otherwise truncation will occur.

  3. OLE DB Command

    The OLE DB Command transformation runs an SQL statement for each row in a data flow. For example, you can run an SQL statement that inserts, updates, or deletes rows in a database table.

Document: OLE DB Command

Lab 7

  1. Lookup

    The Lookup transformation performs lookups by joining data in input columns with columns in a reference dataset. You use the lookup to access additional information in a related table that is based on values in common columns.

    By default the lookup stops the program execution when it does not find a data. You can change the setting when you edit it.

    Document: Lookup

  2. Data Conversion Optimization

    Change the data type before reading data from files will improve the performance of the model.

    Difference between a OK SSIS developer and a GOOD SSIS developer is the performance optimization of SSIS logic!

Lab 8

  1. Merge

    The Merge transformation combines two sorted datasets into a single dataset. The rows from each dataset are inserted into the output based on values in their key columns.

    Merge
  2. Merge Join

    The Merge Join transformation provides an output that is generated by joining two sorted datasets using a FULL, LEFT, or INNER join.

Lab 11

  1. Transaction

    Transaction helps us to logical group activities. So activities which are part of that logical group either all pass or all will fail.

    Sequence Container

    • Not supported: Does not participate in any transaction.
    • Supported: Joins the parent sequence container transaction.
    • Required: Create his own new transaction.
  2. CheckPoints

    Start from the point where the things failed. Do not start from first.

    Create a CheckPointFile to record the successful tasks.

Lab 13

  1. Data Profiling Task

    The Data Profiling task computes various profiles that help you become familiar with a data source and identify problems in the data that have to be fixed.

    Data Profiling Task can only work with SQL Server as the source.

    Data Profile Viewer provides the detail of your data including:

    • Candidate Key
    • Column Length Distribution
    • Column Null Ratio
    • Column Pattern
    • Column Value Distribution

Lab 15

  1. Web Service Task

    The Web Service task executes a Web service method. You can use the Web Service task for the following purposes:

    • Writing to a variable the values that a Web service method returns. For example, you could obtain the highest temperature of the day from a Web service method, and then use that value to update a variable that is used in an expression that sets a column value.

    • Writing to a file the values that a Web service method returns. For example, a list of potential customers can be written to a file and the file then used as a data source in a package that cleans the data before it is written to a database.

    Resources

  1. XML Task

    The XML task is used to work with XML data. Using this task, a package can retrieve XML documents, apply operations to the documents by using Extensible Stylesheet Language Transformations (XSLT) style sheets and XPath expressions, merge multiple documents, or validate, compare, and save the updated documents to files and variables.

    Resorces

Lab 18

  1. Pivot

    Pivot Key: Values in the input data from this column will become new column names in the output.

    Set Key: Identifies a group of input rows that will get pivoted into one output row. The input data must be sorted on this column.

    Pivot Value: Values from this column will be mapped into the new pivot output columns.

    Pivot

    If there exists duplicate data, then you can add Aggregation transformation to get the sum or average of the duplicate values.

    Aggregation
  2. Unpivot

    The Unpivot transformation makes an unnormalized dataset into a more normalized version by expanding values from multiple columns in a single record into multiple records with the same values in a single column.

Resorces

Lab 20

  1. Execute SQL Task

    The Execute SQL task runs SQL statements or stored procedures from a package. The task can contain either a single SQL statement or multiple SQL statements that run sequentially. You can use the Execute SQL task for the following purposes:

    • Truncate a table or view in preparation for inserting data.
    • Create, alter, and drop database objects such as tables and views.
    • Re-create fact and dimension tables before loading data into them.
    • Run stored procedures. If the SQL statement invokes a stored procedure that returns results from a temporary table, use the WITH RESULT SETS option to define metadata for the result set.
    • Save the rowset returned from a query into a variable.

Resources

Lab 22

  1. Script Task

    The Script task provides code to perform functions that are not available in the built-in tasks and transformations that SQL Server Integration Services provides. The Script task can also combine functions in one script instead of using multiple tasks and transformations.

  2. Send Mail Task

    The Send Mail task sends an e-mail message. By using the Send Mail task, a package can send messages if tasks in the package workflow succeed or fail, or send messages in response to an event that the package raises at run time. For example, the task can notify a database administrator about the success or failure of the Backup Database task.

Resources

Lab 23

  1. Script Component

    The Script component hosts script and enables a package to include and run custom script code. You can use the Script component in packages for the following purposes:

    • Apply multiple transformations to data instead of using multiple transformations in the data flow. For example, a script can add the values in two columns and then calculate the average of the sum.
    • Access business rules in an existing .NET assembly. For example, a script can apply a business rule that specifies the range of values that are valid in an Income column.
    • Use custom formulas and functions in addition to the functions and operators that the Integration Services expression grammar provides. For example, validate credit card numbers that use the LUHN formula.
    • Validate column data and skip records that contain invalid data. For example, a script can assess the reasonableness of a postage amount and skip records with extremely high or low amounts.

Resources

上一篇下一篇

猜你喜欢

热点阅读