Jan22 disk failures
2018-01-23 本文已影响0人
aureole420
Disk failures
- Typically as DBMS imp, we assume second storage just works .
- classic recovery doesn't address media (disk) failures.
But sometimes disk does fail
- catastrophic plant failure. --- handle via geographic distribution.
- simple hardware failure ---- "RAID" redundant array of inexpensive disks
How do we detect if a disk is giving bad values?
- checksums common
- use a parity bit for each byte --- e.g. n = 8 bits. 1 (parity bit) 00110010
- for a given block, count # of 0's, count # of 1's
--- add one more bit at end so that # of 1's is even.
if I perform a checksum, what is chance that b bit error will go uncaught?
-- one bit error : destined to catch
--- two bit error, i.e. switch 10 to 01. -- cannot be caught.
---three bit error ...
50% chance that I catch an error of b bits in a block of size n.
So, the idea is to make blocks small, so good chance one of the blocks catch an error. In the extreme (1 bit block) -- you mirror! costly
In practice, use larger blocks and accept some chance of missed error.