index

2019-08-20  本文已影响0人  鲸鱼酱375

0 intro

0.1 type of pages

sql server divides space(storage) into pages

0.1.1 page

8 kb: the smallest unit of storage ;data from only one table is allowed per page)

0.2 page has 3 section

0.2.1 header:

  1. page number
  2. next & previous pages
  3. extent number
  4. space remaining in pages
  5. 96 Bytes

0.2.2 data rows

  1. rows from table

0.2.3 offset

  1. it is a number that references size of rows, if there are 10 rows in a page,there will be 10 offset values.

ps: we can know the record size from the datatype

0.3 extent

Extent is next smallest unit of storage after a page. Size of it is 8*8KB = 64 KB. It holds 8 pages.

extent -----> 8 pages ----> 64kb
row indentifier(RID): extent+page+offset value

0.3.1 type of extent

0.4 heap

a cluster table: a table with clustered index

-problem: table scan (要找所有page,slow)

0.5 index keys

1.1 what is index?

it is a dbo(database object), which sorts data logically/physicially to retrieve data faster

1.1.1 nci vs ci

why we only one clustered index?
physical store; sort in one direction(只能用一种储存方式)
国内查电话的黄书
cluster index

1.2 Balance Tree (b tree)

https://blog.staynoob.cn/post/2017/06/btree-data-structure/

1.2.1 3 types of levels

why only one pk? there no other room to store clusted index?

b tree

1.3 Clustered Index

clusteded index

1.4 Non-Clustered on HEAP

extent: which real page is???
page: divided in 8 sections
offset:

note:
page中,格式是,左边nci,右边extent-page-offset(row identifier)

1.4.1 two way to bring data

回答问题时候,要两种都说

image.png

1.5 Non-Clustered on Clustered

notes
find value ?
way 1: scan
way 2: non-clustered index + key value

image.png

1.6 Seek vs Scan in Indexes

https://blogs.msdn.microsoft.com/apgcdsd/2012/08/01/sql-serverscan-seek/

1.6.1 seek

1.6.1.1 cluster index seek

if sql server knows exactly (in which page) where the data is and reads only those leaf pages which has required data it is calles CI Seek.
usually this is the best way to bring data.
this happens when the column used for index is used in where clause.
root node 有 index(id)的值

1.6.1.2 non clustered index seek

as nci has impact on what columns are in select list and columns used asre in nci it goes for nci seek

1.6.2 scan

in the where condition, 里面找的不是clustered index的信息,就是scan

1.6.2.1 table scan

1.6.2.2 cluster index scan

if sql server has to read through all the leaf pages of the index to retrieve required data it is called as CI scan. usually this is better than table scan not the best way to bring data.
eg:看视频,问老师,老师说看完p1看p2再看p3
有了b tree,先去root找page,再去page中

eg:
select * from batch

  1. 去IAM
    2.去page
    3.直到找到所有相关信息

1.6.2.3 NCI scan

the column in the list are part of index and no search condition is defined

select fname
from person_b30

1.7 Advanced Types of Indexes

1.7.1 Covering Index

bookmark lookup stop in 2008

bookmark lookup:
when we use converting index?
index seek + key lookup太cost了

can use clustered index on pk,others use covering index

1.7.2 Full Text Index (FTI)

CREATE FULLTEXT INDEX ON table_name (column_name1 [...], column_name2 [...]) ...

1.7.2.1 limination of CI/NCI (why use full text index)

1.7.2.2 Steps are taken in the creation of a full text index

1.7.2.3 freetext()

1.7.2.4 cintains()

1.7.3 Filtered Index

it is an opition in nci, where indexing happens only on filtered set( sub set) of data.

https://www.cnblogs.com/woodytu/p/4509812.html

1.7.3.1 用在什么情况?

  1. large volume data; 并且只想要几年的数据
  1. 需要创建ssn,每个有ssn的值都要unique,但是也允许空值

1.7.3.2 advantage & limitations

  1. As Index is created on subset of data size and depth of Index is less, so maintenance and size would be small.
  2. Data searched within the scope of index is much faster as the data Indexed is less.
  3. If there is a scenario where we need UNIQUE values in the column which can support multiple NULL values.
  1. WHERE condition defined for filtering data has to Deterministic (Static)
  2. Certain functions and operators like BETWEEN are not supported in WHERE condition.

1.7.4 Indexed Views (materialized view)

a solution for creating two CI on a single table

1.7.4.1 when we use?

1.9 column store index

1.9.1 limitation of row store index

1.9.2 CSI

A rowgroup is a group of rows that are compressed into columnstore format at the same time. A rowgroup usually contains the maximum number of rows per rowgroup, which is 1,048,576 rows.
For high performance and high compression rates, the columnstore index slices the table into rowgroups, and then compresses each rowgroup in a column-wise manner. The number of rows in the rowgroup must be large enough to improve compression rates, and small enough to benefit from in-memory operations.

-解决第二个问题:
eg:title都是重复的title,把他们变成数字+title (dictoniary),int比varchar更省内存

1.9.2.1

1.9.2 2种压缩方式:

1.9.3版本比较

image.png

1.9.4 ADVANTAGE of CSI

1.9.5 disadvantages

1.9.5.1 DML

DML INSERT

DML DELETE
it will not delete physicialy,it will create the matrix,untill rebuild

delta store:
it is a structure like CI, which is used by SS for Insert and Update operations on

1.10 note

1.11 what is difference between clustered CSI and Nonclustered CSI?

1.12 code in the class

IF object_id('Person_B30') IS NOT NULL
DROP TABLE Person_B30
GO

CREATE TABLE [dbo].Person_B30(ID INT IDENTITY,
    [FirstName] [varchar](50) NOT NULL,
    [LastName] [varchar](50) NOT NULL,
    [JobTitle] [varchar](50),
    [BirthDate] [date],
    [MaritalStatus] [char](1),
    [Gender] [char](1)
) 
GO

INSERT INTO Person_B30
SELECT P.FirstName, P.LastName, E.JobTitle, E.BirthDate, E.MaritalStatus, E.Gender
FROM 
AdventureWorks2017.Person.Person P
LEFT JOIN AdventureWorks2017.HumanResources.Employee E
on E.BusinessEntityID = p.BusinessEntityID
GO

--NO Index on the Table (HEAP)
--Table Scan
SELECT * 
FROM Person_B30

--Table Scan
SELECT * 
FROM Person_B30
WHERE ID = 100

--Create a Clustered Index on ID column
CREATE CLUSTERED INDEX IX_Person_B30_ID
ON Person_B30 (ID)
GO

--CI Scan
SELECT * 
FROM Person_B30

--CI Seek
SELECT * 
FROM Person_B30
WHERE ID = 100

--CI Seek, columns listed in SELECT will not have any impact on data retrieval operation (scan, seen) in Regular C Indexes (Row Store Indexes)
SELECT ID, FirstName 
FROM Person_B30
WHERE ID = 100

--CI Scan because the column used in WHERE is not used in Index
SELECT ID, FirstName 
FROM Person_B30
WHERE FirstName = 'Abe'

--CI Scan SELCT column list has no impact
SELECT ID
FROM Person_B30
WHERE FirstName = 'Abe'

--CI Seek
SELECT FirstName, LastName 
FROM Person_B30
WHERE ID = 100

--How to create an C Index on a column in descending order
CREATE CLUSTERED INDEX IX_Person_B30_ID
ON Person_B30 (ID DESC)
GO

--Can we create a CI with multiple columns
    --Yes it is called Composite CI
--What is the max number of columns that can be used in a CI?
    --32

DROP TABLE Index_Limits
GO

CREATE TABLE Index_Limits(ColLimit CHAR(1000),
Col1 INT, Col2 INT, Col3 INT, Col4 INT, Col5 INT, Col6 INT, Col7 INT, Col8 INT, Col9 INT, Col10 INT, Col11 INT, Col12 INT, Col13 INT, Col14 INT, Col15 INT, Col16 INT, Col17 INT, Col18 INT, Col19 INT, Col20 INT, Col21 INT, Col22 INT, Col23 INT, Col24 INT, Col25 INT, Col26 INT, Col27 INT, Col28 INT, Col29 INT, Col30 INT, Col31 INT, Col32 INT, Col33 INT, Col34 INT, Col35 INT, Col36 INT)
GO

--Following query will fail because CI size should not be more than 900Bytes, but in our case it is 1000
CREATE CLUSTERED INDEX IX_Index_Limits
ON Index_Limits (ColLimit)

--Following will fail becasue CI can be created with 32 columns but not more
CREATE CLUSTERED INDEX IX_Index_Limits
ON Index_Limits (Col1, Col2, Col3, Col4,
Col5,
Col6,
Col7,
Col8,
Col9,
Col10,
Col11,
Col12,
Col13,
Col14,
Col15,
Col16,
Col17,
Col18,
Col19,
Col20,
Col21,
Col22,
Col23,
Col24,
Col25,
Col26,
Col27,
Col28,
Col29,
Col30,
Col31,
Col32, Col33, Col34, Col35, Col36)
GO

--DROP CI on Person_B30
DROP INDEX IX_Person_B30_ID
ON Person_B30

--Create NCI on FirstName
CREATE NONCLUSTERED INDEX IX_Person_B30_FirstName
ON Person_B30 (FirstName)
GO

--NCI Seek + RID Look Up
--Because there is only 1 record with the matching data so SQL Server uses above way to retrieve data
SELECT *
FROM Person_B30
WHERE FirstName = 'Abe'

--Table Scan, becasue there are over 50 records matching serach condition so SS found Table Scan is better than NCI Seek+RID Look Up
--RID Look Up operation happens only when there is a NCI on a Heap 
SELECT *
FROM Person_B30
WHERE FirstName = 'Aaron'

--NCI Seek + RID Look Up OR Table Scan
SELECT FirstName, LastName, Gender
FROM Person_B30
WHERE FirstName = 'Abe'

--NCI Seek, as NCI has impact on what columns are in SELECT list and columns used are in NCI and search condition is defined based on Index Column, it goes for NCI Seek
SELECT FirstName
FROM Person_B30
WHERE FirstName = 'Aaron'

--NCI Scan, because the columns in the list are part of Index and no search condition is defined
SELECT FirstName
FROM Person_B30

--DROP NCI on Person_B30
DROP INDEX IX_Person_B30_FirstName
ON Person_B30

--Create NCI on FirstName
CREATE CLUSTERED INDEX IX_Person_B30_ID
ON Person_B30 (ID)
GO

--Create NCI on FirstName
CREATE NONCLUSTERED INDEX IX_Person_B30_FirstName
ON Person_B30 (FirstName)
GO

--CI Seek, as SQL Server knows which pages to pull data from
SELECT *
FROM Person_B30
WHERE ID BETWEEN 1 AND 250

SELECT *
FROM Person_B30
WHERE ID > 0

--
SELECT *
FROM Person_B30
WHERE FirstName = 'Ana'

SELECT ID, FirstName, LastName, JobTitle
FROM Person_B30
WHERE FirstName = 'Ana'

SELECT ID, FirstName
FROM Person_B30
WHERE FirstName = 'Ana'

--In the following query columns selected are available in both NCI and CI, but CI has more number of pages than NCI (as data content is less) so SS goes for NCI Scan over CI Scan
SELECT ID, FirstName
FROM Person_B30

--DROP NCI on Person_B30
DROP INDEX IX_Person_B30_FirstName
ON Person_B30

--Create a covering index to solve Key Look Up and RID Look Up
CREATE NONCLUSTERED INDEX IX_Person_B30_FirstName
ON Person_B30 (FirstName)
INCLUDE (LastName, JobTitle)
GO


SELECT FirstName, LastName, JobTitle, ID
FROM Person_B30

--Creating table to compare Covering Index with Non Covering
IF object_id('Person_B30_NonCover') IS NOT NULL
DROP TABLE Person_B30_NonCover
GO

CREATE TABLE [dbo].Person_B30_NonCover(ID INT IDENTITY,
    [FirstName] [varchar](50) NOT NULL,
    [LastName] [varchar](50) NOT NULL,
    [JobTitle] [varchar](50),
    [BirthDate] [date],
    [MaritalStatus] [char](1),
    [Gender] [char](1)
) 
GO

INSERT INTO Person_B30_NonCover
SELECT P.FirstName, P.LastName, E.JobTitle, E.BirthDate, E.MaritalStatus, E.Gender
FROM 
AdventureWorks2017.Person.Person P
LEFT JOIN AdventureWorks2017.HumanResources.Employee E
on E.BusinessEntityID = p.BusinessEntityID
GO

--Create a CI on new table
CREATE CLUSTERED INDEX IX_Person_B30_NonCover
ON Person_B30_NonCover(ID)
GO

--Create a NCI on new table
CREATE NONCLUSTERED INDEX IX_Person_B30_NonCover_FirstName
ON Person_B30_NonCover(FirstName)
GO

DECLARE @ST DATETIME, @ET DATETIME
SELECT @ST = SYSDATETIME()
SELECT ID, FirstName, LastName, JobTitle
FROM Person_B30
WHERE FirstName = 'Ana'
SELECT @ET = SYSDATETIME()
SELECT DATEDIFF(MICROSECOND, @ST, @ET) AS 'Cover'

SELECT @ST = SYSDATETIME()
SELECT ID, FirstName, LastName, JobTitle
FROM Person_B30_NonCover
WHERE FirstName = 'Ana'
SELECT @ET = SYSDATETIME()
SELECT DATEDIFF(MICROSECOND, @ST, @ET) AS 'Non Cover'


SELECT *
FROM sys.dm_db_index_physical_stats(db_id('Training_SQL'), object_id('Person_B30'), 1, null, 'detailed') 

--Inserting same data 10 times to get an intermediate level
INSERT INTO Person_B30
SELECT P.FirstName, P.LastName, E.JobTitle, E.BirthDate, E.MaritalStatus, E.Gender
FROM 
AdventureWorks2017.Person.Person P
LEFT JOIN AdventureWorks2017.HumanResources.Employee E
on E.BusinessEntityID = p.BusinessEntityID
GO 10


SELECT *
FROM sys.dm_db_index_physical_stats(db_id('Training_SQL'), object_id('Person_B30'), 1, null, 'detailed') 

--Filtered Index
--It is an option in NCI, where Indexing happens only on filtered set (sub set) of data

--Table for Filter Index
SELECT max(H.OrderDate), MIN(H.OrderDate)
--H.SalesOrderID, H.SalesPersonID, H.OrderDate, H.TaxAmt, H.TotalDue, D.OrderQty, D.UnitPrice 
--INTO AdvSales_Filter
FROM AdventureWorks2017.Sales.SalesOrderHeader H
JOIN AdventureWorks2017.Sales.SalesOrderDetail D
ON H.SalesOrderID = D.SalesOrderID

--Table for Non Filter Index
SELECT H.SalesOrderID, H.SalesPersonID, H.OrderDate, H.TaxAmt, H.TotalDue, D.OrderQty, D.UnitPrice 
INTO AdvSales_NonFilter
FROM AdventureWorks2017.Sales.SalesOrderHeader H
JOIN AdventureWorks2017.Sales.SalesOrderDetail D
ON H.SalesOrderID = D.SalesOrderID

--Craete a Covering Filtered Index for subset of the data in the table
CREATE NONCLUSTERED INDEX IX_AdvSales_Filter
ON AdvSales_Filter(SalesOrderID)
INCLUDE(OrderDate, TotalDue, OrderQty)
WHERE OrderDate>= '02/01/2008' AND  OrderDate <= '07/31/2008' 

--Craete a Covering Index for all the data in the table
CREATE NONCLUSTERED INDEX IX_AdvSales_NonFilter
ON AdvSales_NonFilter(SalesOrderID)
INCLUDE(OrderDate, TotalDue, OrderQty)

SELECT OrderDate, TotalDue, OrderQty, SalesOrderID 
FROM AdvSales_Filter
WHERE OrderDate>= '05/01/2008' AND  OrderDate <= '05/15/2008' 

SELECT OrderDate, TotalDue, OrderQty, SalesOrderID
FROM AdvSales_NonFilter
WHERE OrderDate>= '05/01/2008' AND  OrderDate <= '05/15/2008' 

--Creating a Student table to hold multiple NULL values for SSN and all other Non NULL values should be UNIQUE
CREATE TABLE B30_Students(StuID INT, StuName VARCHAR(25), SSN CHAR(11))
GO

CREATE UNIQUE NONCLUSTERED INDEX IX_B30_Students_SSN
ON B30_Students (SSN)
WHERE SSN IS NOT NULL

INSERT INTO B30_Students VALUES
(1, 'Jiacheng', NULL),
(2, 'Luyao', '123-45-6789'),
(3, 'Zoe', NULL)

INSERT INTO B30_Students VALUES
(4, 'Ling', '123-45-6789')

SELECT * FROM B30_Students


上一篇下一篇

猜你喜欢

热点阅读