PBFT详细分析n大于3f+1

2018-03-05  本文已影响88人  Simth

Practical Byzantine Fault Tolerance

Suppose you have N replicas, f of which might crash (non-Byzantine failure)
What quorum size Q do you need to guarantee liveness and safety?

Now say we throw in Byzantine failures. One view...
Say you have N nodes, f of which might experience Byzantine failure.
First, how can Byzantine failures be worse than non-Byzantine?
Byzantine nodes can vote for both a statement and its contradiction
Make different statements to different nodes
Consequences
Risks driving non-failed nodes into divergent states
Risks driving non-failed nodes into "stuck states"
E.g., cause split vote on seemingly irrefutable statement
Paxos example: You think majority aborted some ballot b v
You vote to commit b' v' (where b' > b, v' != v)
Can't convince other nodes it is safe to vote for b'

What quorum size Q do we need in Byzantine setting?

So how does PBFT protocol work?
Number replica cohorts 1, 2, 3, ..., 3f+1
Number requests with consecutive sequence numbers (not viewstamps)
System goes through a series of views
In view v, replica number v mod (3f+1) is designated the primary
Primary is responsible for selecting the order of operations
Assigns an increasing sequence number to each operation
In normal-case operation, use two-round protocol for request r:
Round 1 (pre-prepare, prepare) goal:
Ensure at least f+1 honest replicas agree that
If request r executes in view v, will execute with sequence no. n
Round 2 (commit) goal:
Ensure at least f+1 honest replicas agree that
Request r has executed in view v with sequence no. n

Protocol for normal-case operation
Let c be client
r_i be replica i, or p primary, b_i backup i
R set of all replicas

c -> p:  m = {REQUEST, o, t, c}_Kc
p -> R:  {PRE-PREPARE, v, n, d}_Kp, m     (note d = H(m))

b_i -> R: {PREPARE, v, n, d, i}_K{r_i}
[Note all messages signed, so will omit signatures and use < > henceforth.]

replica r_i now waits for PRE-PREPARE + 2f matching PREPARE messages
puts these messages in its log
then we say prepared(m, v, n, i) is TRUE

Note: If prepared(m, v, n, i) is TRUE for honest replica r_i
then prepared(m', v, n, j) where m' != m FALSE for any honest r_j
So no other operation can execute with view v sequence number n

Are we done? Just reply to client? No
Just because some other m' won't execute at (v,n) doesn't mean m will
Suppose r_i is compromised right after prepared(m, v, n, i)
Suppose no other replica received r_i's prepare message
Suppose f replicas are slow and never even received the PRE-PREPARE
No other honest replica will know the request prepared!
Particularly if p fails, request might not get executed!

So we say operation doesn't execute until
prepared(m, v, n, i) is TRUE for f+1 non-faulty replicas r_i
We say committed(m, v, n) is TRUE when this property holds

So how does a replica know committed(m, v, n) holds?
Add one more message:

r_i -> R: <COMMIT, v, n, d, i> (sent only after prepared(m,v,n,i))

replica r_i waits for 2f+1 identical COMMIT messages (including its own)
committed-local(m, v, n, i) is TRUE when:
prepared(m, v, n, i) is TRUE, and
r_i has 2f+1 matching commits in its log

Note: If committed-local(m, v, n, i) is TRUE for any non-faulty r_i
Then means committed(m, v, n) is TRUE.
r_i knows when committed-local is TRUE
So committed-local is a replica's way of knowing that committed is TRUE

r_i replies to client when committed-local(m, v, n, i) is TRUE
Client waits for f+1 matching replies, then returns to client
Why f+1 and not 2f+1?
Because of f+1, at least one replica r_i is non-faulty
So client knows committed-local(m, v, n, i)
Which in turn implies committed(m, v, n)
Note tentative reply optimization:
r_i can send tentative reply to client after prepared(m, v, n, i)
Client can accept result after 2f+1 matching tentative replies. Why?
f+1 of those replies must be from honest nodes
And at least 1 of those f+1 will be part of 2f+1 forming a new view
So that 1 node will make sure operation makes it to new view

Garbage collecting the message log
make periodic checkpoints
Broadcast <CHECKPOINT, n, d, i>, where d = digest of state
When 2f+1 signed CHECKPOINTs received
restrict sequence numbers are between h and H
h = sequence number of last stable checkpoint
H = h + k (e.g., k might be 2 * checkpoint interval of 100)
delete all messages below sequence number of stable checkpoint

View changes
When client doesn't get an answer, broadcasts message to all replicas
If a backup notices primary is slow/unresponsive:

When primary of view v+1 sees 2f signed VIEW-CHANGE messages from others

Replicas may optain any missing state from each other
(e.g., stable checkpoint data, or missing operation, since
reissued pre-prepare messages only contain digest of request)

What happens if primary creates incorrect O in NEW-VIEW message?
E.g., might send null requests for operations that prepared
Other replicas can compute O from V, and can reject NEW-VIEW message
What happens if primary sends different V's to different backups?
Still okay, because any committed operation will be in 2f+1 VIEW-CHANGE msgs
of which f+1 must be honest, so at least one member of V will have operation
So new primary cannot cause committed operations to be dropped
Only operations for which client has not yet seen the answer

Discussion
what problem does BFS solve?

上一篇 下一篇

猜你喜欢

热点阅读