dimensional & fact

2019-08-27 本文已影响0人鲸鱼酱375

1.Dimension Table

A dimension table is unique to data warehouses
Used to hold descriptive or textual informative data
Does not mean there is ONLY characters, can also be numeric of used for description

1.1 advanced dimensional

1.1.1 Conformed Dimension

A conformed dimension is a dimension that is referenced in by many Fact tables
In each Fact table, the Dimension provides the exact same data or purpose
Conformed dimensions can also be when two dimensions match each other exactly or are perfect subsets
eg: same dimension value are using in factsales table and factpayments table

conformed dimension

1.1.2 Junk Dimension

A junk dimension is a handy grouping of data this is not necessarily used in any analysis
Data used such as flags or markers must still be used in a data warehouse to help identify some information, but it may not necessarily have anything to do with the measurements
Examples included True, False, and other comments
eg:flag never used for analysis, but we still need it

junk dimension

Transactional business processes typically produce a number of miscellaneous, lowcardinality fl ags and indicators. Rather than making separate dimensions for each fl ag and attribute, you can create a single junk dimension combining them together.This dimension, frequently labeled as a transaction profi le dimension in a schema,does not need to be the Cartesian product of all the attributes’ possible values, but should only contain the combination of values that actually occur in the source data.

1.1.3 Degenerate Dimension

A degenerate dimension is a key used typically in a dimensions, but lacks significant attributes
Degenerate dimensions are often put into Fact tables, since they have little to contribute
Often times play a role in identifying different fact values
Receipt ID & Transaction ID
eg: banktranid is degenerate dimension, however, for the diminvoceno table doesn't exist

degenerate

This degenerate dimension is placed in the fact table with the explicit acknowledgment that there is no associated dimension table. Degenerate dimensions are most common with transaction and accumulating snapshot fact tables.

1.1.4 Role-Playing Dimension

Role playing dimensions are used to have a single dimension serve multiple roles when a Fact table(s) are referencing it
A Fact table can reference the dimensions in one instance and have a specific meaning, and then reference the same table again but this time have new meaning or data to use

A single physical dimension can be referenced multiple times in a fact table, with each reference linking to a logically distinct role for the dimension. For instance, a fact table can have several dates, each of which is represented by a foreign key to the date dimension. It is essential that each foreign key refers to a separate view of the date dimension so that the references are independent. These separate dimension views (with unique attribute column names) are called roles.

one dimensional table play multiple role,
in this example, dimdate table can play several role in fact table, like order date, shipdate and duedate

role-playing

1.1.5 Rapidly Changing

A Rapidly Changing dimension is a dimensions who’s data values constantly change
In a table where only single values are stored, these values will constantly change as time goes on
In a table where all records are kept, these values will each be recorded as new rows
update

1.1.6 Rapidly Growing

A dimensions whose values or rows increase exponentially
Rows are inserted on a constantly high rate from the data it’s monitoring, causing the table to expand very fast
History of stock market prices, history of gas prices internationally, etc.
insert

1.1.7 Slowly Changing Dimensions

Type 0

Data that never changes and should not be changed
ssn,dob,customerid,employeeid
need triggers(?)

Type 1

Data the does change, but only the most recent value is shown
eg: name,phone,address
just update value,no record(?)

Type 2

Data changes, and now all changes or modifications are recorded
we use json value(hash value?) to compare the changing

there are two way to record these changes

method: flag，satrt date & end date
flag: update flag to 1 to 0,then insert new value, change flag to 1

flag
start date & end date: update table where end date is null; then insert new value, then put end date is null

start & end date
for those two methods, we need follow the rules: update first,then insert

Type 3

Data changes, is recorded, but now we only record up to a certain number of records
method: add a new column to store historical value,but control the number of column, like previous address and address,don't create a lot of new column
in this example, we change address of John a lot of times, but we just need record the last change record

method

Type 4

Uses two tables, one being a history table
Recent values in one table, while historical in another
you can move type 0 column, just save the changing column

type 4

Type 6

Combination of features from types 1, 2, and 3
1+2+3=6
less happen
a table have type 6, not only one column

1.1.8 parent- child dimension

dimension reference itself to calculate(?) , like self join
example
https://docs.microsoft.com/en-us/analysis-services/multidimensional-models/parent-child-dimension

2.Fact Table

A fact table is unique to data warehouses
Fact tables contain all numerical measurements used for analysis
Can contain textual data for measurements or Degenerate Dimensions

2.1 Facts / Measures

Additive Measures

Values or measurements in a fact table that can be added together, no matter what dimension
the dimension can add without using other dimension

Semi-Additive Measure

Values that can be added up to a certain point with some dimensions, but not all
depends on some other dimensions to be aggregate,if add alone, it has no sense

Non-Additive Measure

Values that can NOT be added across any dimensions
eg: %discount

facts

in this example, amount is additive measures, unit is semi-additive measure (rely on pid), discount is non-additive measure

2.2 fact table /measure groups

2.2.1 Cumulative

Category of Facts where the values can be added or accumulated into a single value for measurements
Typically implies using Additive measures

2.2.2 Snapshot

Category of Facts where the values are seen just in that moment in time, like a picture
Values that work only in that instance
Typically implies using Semi-Additive and Non-Additive measures

2.2.3 factless fact

like conjunction table
many to many relationship
just store keys,no dimensional ,no other column
between dimension to dimension,dimension to fact
must create dimension table first

2.2.4 inferred members/dimension

early arriving facts and late arriving dimensions
eg: boa transaction pending
how to solve it: insert in fact table with blank record and flag; when you have dimension , update

3.Determining Where Data Should Go

Dimension Table	Fact Table
Contains Surrogate keys to replace old Primary Keys	Contains Foreign keys that point to the Surrogate keys in various Dimensions to link descriptive info
Can also contain Business Keys to reference OLTP	Designed to hold all measurement data or statistical data for use in analysis
Designed to hold all descriptive data to explain what the data in the Fact tables are or mean

7. 笔记

1.星星的中心是一个大的事实表，发散出来的是维度表，每一个维度表用一个PK-FK连接到事实表，维度表之间彼此并不关联。一个事实表又包括一些度量值和维度。

2.雪花模型通过规范维度表来减少冗余度，也就是说，维度表数据已经被分组成一个个的表而不是使用一个大表。例如产品表被分成了产品大类和产品小类两个表。尽管这样做可以节省了空间，但是却增加了维度表的数量和关联的外键的个数。这就导致了更复杂的查询并降低了数据库的效率