Jan22 disk failures

2018-01-23 本文已影响0人 aureole420

Disk failures

Typically as DBMS imp, we assume second storage just works .
classic recovery doesn't address media (disk) failures.

But sometimes disk does fail

catastrophic plant failure. --- handle via geographic distribution.
simple hardware failure ---- "RAID" redundant array of inexpensive disks

How do we detect if a disk is giving bad values?

checksums common

use a parity bit for each byte --- e.g. n = 8 bits. 1 (parity bit) 00110010
for a given block, count # of 0's, count # of 1's
--- add one more bit at end so that # of 1's is even.
if I perform a checksum, what is chance that b bit error will go uncaught?
-- one bit error : destined to catch
--- two bit error, i.e. switch 10 to 01. -- cannot be caught.
---three bit error ...
50% chance that I catch an error of b bits in a block of size n.

So, the idea is to make blocks small, so good chance one of the blocks catch an error. In the extreme (1 bit block) -- you mirror! costly
In practice, use larger blocks and accept some chance of missed error.

Jan22 disk failures

猜你喜欢

热点阅读