dimensional & fact
1.Dimension Table
- A dimension table is unique to data warehouses
- Used to hold descriptive or textual informative data
- Does not mean there is ONLY characters, can also be numeric of used for description
1.1 advanced dimensional
1.1.1 Conformed Dimension
-
A conformed dimension is a dimension that is referenced in by many Fact tables
-
In each Fact table, the Dimension provides the exact same data or purpose
-
Conformed dimensions can also be when two dimensions match each other exactly or are perfect subsets
-
eg: same dimension value are using in factsales table and factpayments table
conformed dimension
1.1.2 Junk Dimension
- A junk dimension is a handy grouping of data this is not necessarily used in any analysis
- Data used such as flags or markers must still be used in a data warehouse to help identify some information, but it may not necessarily have anything to do with the measurements
- Examples included True, False, and other comments
-
eg:flag never used for analysis, but we still need it
junk dimension
Transactional business processes typically produce a number of miscellaneous, lowcardinality fl ags and indicators. Rather than making separate dimensions for each fl ag and attribute, you can create a single junk dimension combining them together.This dimension, frequently labeled as a transaction profi le dimension in a schema,does not need to be the Cartesian product of all the attributes’ possible values, but should only contain the combination of values that actually occur in the source data.
1.1.3 Degenerate Dimension
-
A degenerate dimension is a key used typically in a dimensions, but lacks significant attributes
-
Degenerate dimensions are often put into Fact tables, since they have little to contribute
-
Often times play a role in identifying different fact values
-
Receipt ID & Transaction ID
-
eg: banktranid is degenerate dimension, however, for the diminvoceno table doesn't exist
degenerate
This degenerate dimension is placed in the fact table with the explicit acknowledgment that there is no associated dimension table. Degenerate dimensions are most common with transaction and accumulating snapshot fact tables.
1.1.4 Role-Playing Dimension
- Role playing dimensions are used to have a single dimension serve multiple roles when a Fact table(s) are referencing it
- A Fact table can reference the dimensions in one instance and have a specific meaning, and then reference the same table again but this time have new meaning or data to use
A single physical dimension can be referenced multiple times in a fact table, with each reference linking to a logically distinct role for the dimension. For instance, a fact table can have several dates, each of which is represented by a foreign key to the date dimension. It is essential that each foreign key refers to a separate view of the date dimension so that the references are independent. These separate dimension views (with unique attribute column names) are called roles.
-
one dimensional table play multiple role,
in this example, dimdate table can play several role in fact table, like order date, shipdate and duedate
role-playing
1.1.5 Rapidly Changing
- A Rapidly Changing dimension is a dimensions who’s data values constantly change
- In a table where only single values are stored, these values will constantly change as time goes on
- In a table where all records are kept, these values will each be recorded as new rows
- update
1.1.6 Rapidly Growing
- A dimensions whose values or rows increase exponentially
- Rows are inserted on a constantly high rate from the data it’s monitoring, causing the table to expand very fast
- History of stock market prices, history of gas prices internationally, etc.
- insert
1.1.7 Slowly Changing Dimensions
Type 0
- Data that never changes and should not be changed
- ssn,dob,customerid,employeeid
- need triggers(?)
Type 1
- Data the does change, but only the most recent value is shown
- eg: name,phone,address
- just update value,no record(?)
Type 2
- Data changes, and now all changes or modifications are recorded
- we use json value(hash value?) to compare the changing
there are two way to record these changes
-
method: flag,satrt date & end date
-
flag: update flag to 1 to 0,then insert new value, change flag to 1
flag -
start date & end date: update table where end date is null; then insert new value, then put end date is null
start & end date -
for those two methods, we need follow the rules: update first,then insert
Type 3
- Data changes, is recorded, but now we only record up to a certain number of records
-
method: add a new column to store historical value,but control the number of column, like previous address and address,don't create a lot of new column
in this example, we change address of John a lot of times, but we just need record the last change record
method
Type 4
- Uses two tables, one being a history table
- Recent values in one table, while historical in another
-
you can move type 0 column, just save the changing column
type 4
Type 6
- Combination of features from types 1, 2, and 3
1+2+3=6 - less happen
- a table have type 6, not only one column
1.1.8 parent- child dimension
- dimension reference itself to calculate(?) , like self join
example
https://docs.microsoft.com/en-us/analysis-services/multidimensional-models/parent-child-dimension
2.Fact Table
- A fact table is unique to data warehouses
- Fact tables contain all numerical measurements used for analysis
- Can contain textual data for measurements or Degenerate Dimensions
2.1 Facts / Measures
Additive Measures
- Values or measurements in a fact table that can be added together, no matter what dimension
- the dimension can add without using other dimension
Semi-Additive Measure
- Values that can be added up to a certain point with some dimensions, but not all
- depends on some other dimensions to be aggregate,if add alone, it has no sense
Non-Additive Measure
- Values that can NOT be added across any dimensions
- eg: %discount
- in this example, amount is additive measures, unit is semi-additive measure (rely on pid), discount is non-additive measure
2.2 fact table /measure groups
2.2.1 Cumulative
- Category of Facts where the values can be added or accumulated into a single value for measurements
- Typically implies using Additive measures
2.2.2 Snapshot
- Category of Facts where the values are seen just in that moment in time, like a picture
- Values that work only in that instance
- Typically implies using Semi-Additive and Non-Additive measures
2.2.3 factless fact
- like conjunction table
- many to many relationship
- just store keys,no dimensional ,no other column
- between dimension to dimension,dimension to fact
- must create dimension table first
2.2.4 inferred members/dimension
- early arriving facts and late arriving dimensions
- eg: boa transaction pending
- how to solve it: insert in fact table with blank record and flag; when you have dimension , update
3.Determining Where Data Should Go
Dimension Table | Fact Table |
---|---|
Contains Surrogate keys to replace old Primary Keys | Contains Foreign keys that point to the Surrogate keys in various Dimensions to link descriptive info |
Can also contain Business Keys to reference OLTP | Designed to hold all measurement data or statistical data for use in analysis |
Designed to hold all descriptive data to explain what the data in the Fact tables are or mean |
7. 笔记
1.星星的中心是一个大的事实表,发散出来的是维度表,每一个维度表用一个PK-FK连接到事实表,维度表之间彼此并不关联。一个事实表又包括一些度量值和维度。
2.雪花模型通过规范维度表来减少冗余度,也就是说,维度表数据已经被分组成一个个的表而不是使用一个大表。例如产品表被分成了产品大类和产品小类两个表。尽管这样做可以节省了空间,但是却增加了维度表的数量和关联的外键的个数。这就导致了更复杂的查询并降低了数据库的效率