Jan 19/22 buffer management
Memory management in DBs.
- As databases designer, you don't trust the os.
- Includes Memory management
- So we generally implement by ourselves (garbage collection, no malloc, new and free, delete --expensive)
Typically on startup.
- DB allocate huge area of RAM. (GBs.)
- Take and inserts all the ram into the systems buffer managers as a set of fixed size pages.
What are these pages used for?
- speed i/o (why reread rewrite to the disk if used very often).
- Temp storage for running data proc. algorithms.
Question: should buffer cache and scratch pool be seperate?
- pros: perhaps use storage more efficiently (keeping them together using the same set of ram)
- cons: the system must ensure one side not being starved for RAM.
In homework, we are going to put them together. Our System has no separation to get RAM, can call:
-
getHandle(DBtable, pagenum)
-- for buffer cache (handle is more like a pointer) -
getHandle()
. -- for scratch
Key thing: DB working set often large than all available ram.
The question: how to handle this?
- page data in and out. (write out some data to disk to free some space)
- should be transparent to user! (the low level system should be automatically deal with this.)
So, what do we do when we "page data"?
- We choose an existing (used) page to evict (move out of the dbms).
- If the page is dirty (data havs been changed, but not written back to disk). write its contents to disk.
- (fill contents of page (if necessary) first) -- Give a handle to the page back to requester
What are common eviction policies?
- LRU (can be challenge to implement)
- associate a timestamp with each page.
- whenever a page is accessed, set timestamp to current CLOCK value (basically a counter); inc clock
- when need to evict pages, choose page with smallest timestamp
You have to index the pages in RAM via (several ways):
- TS- timestampe (you don't want to scan all pages) --- really bad considering updating (inserting/deleting) the data structure (priority queue for example)
- By file, pagenum
- via a "super fast" pointer or handle. --- if some one called getHandle, they want ram which they can get easy access to.
Clock algorithm
-
Attempts to address slowness of updates of TS
-
idea: logically organize pages around the of a "clock"
-
Clock has a second hand pointing at "current" page.
-
Each page has a "DNE" (do-not evict -- I am not allowed to evict that page.) bit
-
When request eviction, sweep second hand until find page with DNE='false'
-
whenever the second hand passes page, set DNE = "false" (but not evict it)
-
Whenever access a page, set DNE = "true"
-
generally a good approximation of LRU but much easier to implement.
-
all you have to do is to change a bit (while in LRU delete and reinsert in a data structure which is expensive).
Jan 22 Monday
LRU:
good: temporal locality, its "optimal"
bad:
- expensive; simplest data structure to implement is priority queue.
- classical failure case:
1. vulnerable to attack
2. vulnerable to real-life access pattern: repeated file scan. -- 100 pages to read in ram, 98 pages of ram available, will REWRITE the first two pages when reading.
-- solved by allowing pinning pages, will only have first 98 pages cached in RAM, the left 2 pages will never being read.
clock
good: approximate LRU, less cost
bad: eviction may require full sweep of arm (but is it bad? you are not going to pay this cost unless you make eviction)
homework related:
smart-pointer: nowadays good coding styles, use smart pointers instead of raw pointer. For homework, will need the raw pointers e.g for page.;