Skip to content

Commit 68e176c

Browse files
committed
First draft of developer docs
Still lots of detail that could be added.
1 parent 3cb47c5 commit 68e176c

File tree

2 files changed

+151
-0
lines changed

2 files changed

+151
-0
lines changed

docs/README.md

Lines changed: 106 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,106 @@
1+
# Raft Developer Documentation
2+
3+
This documentation provides a high level introduction to the `hashicorp/raft`
4+
implementation. The intended audience is anyone interested in understanding
5+
or contributing to the code.
6+
7+
## Contents
8+
9+
1. [Terminology](#terminology)
10+
2. [Operations](#operations)
11+
1. [Apply](./apply.md)
12+
3. [Threads](#threads)
13+
14+
15+
## Terminology
16+
17+
This documentation uses the following terms as defined.
18+
19+
* **Cluster** - the set of peers in the raft configuration
20+
* **Peer** - a node that participates in the consensus protocol using `hashicorp/raft`. A
21+
peer may be in one of the following states: **follower**, **candidate**, or **leader**.
22+
* **Log** - the full set of log entries.
23+
* **Log Entry** - an entry in the log. Each entry has an index that is used to order it
24+
relative to other log entries.
25+
* **Committed** - A log entry is considered committed if it is safe for that entry to be
26+
applied to state machines. A log entry is committed once the leader that created the
27+
entry has replicated it on a majority of the peers. A peer has successfully
28+
replicated the entry once it is persisted.
29+
* **Applied** - log entry applied to the state machine (FSM)
30+
* **Term** - raft divides time into terms of arbitrary length. Terms are numbered with
31+
consecutive integers. Each term begins with an election, in which one or more candidates
32+
attempt to become leader. If a candidate wins the election, then it serves as leader for
33+
the rest of the term. If the election ends with a split vote, the term will end with no
34+
leader.
35+
* **FSM** - finite state machine, stores the cluster state
36+
* **Client** - the application that uses the `hashicorp/raft` library
37+
38+
## Operations
39+
40+
### Leader Write
41+
42+
Most write operations must be performed on the leader.
43+
44+
* RequestConfigChange - update the raft peer list configuration
45+
* Apply - apply a log entry to the log on a majority of peers, and the FSM. See [raft apply](apply.md) for more details.
46+
* Barrier - a special Apply that does not modify the FSM, used to wait for previous logs to be applied
47+
* LeadershipTransfer - stop accepting client requests, and tell a different peer to start a leadership election
48+
* Restore (Snapshot) - overwrite the cluster state with the contents of the snapshot (excluding cluster configuration)
49+
* VerifyLeader - send a heartbeat to all voters to confirm the peer is still the leader
50+
51+
### Follower Write
52+
53+
* BootstrapCluster - store the cluster configuration in the local log store
54+
55+
56+
### Read
57+
58+
Read operations can be performed on a peer in any state.
59+
60+
* AppliedIndex - get the index of the last log entry applied to the FSM
61+
* GetConfiguration - return the latest cluster configuration
62+
* LastContact - get the last time this peer made contact with the leader
63+
* LastIndex - get the index of the latest stored log entry
64+
* Leader - get the address of the peer that is currently the leader
65+
* Snapshot - snapshot the current state of the FSM into a file
66+
* State - return the state of the peer
67+
* Stats - return some stats about the peer and the cluster
68+
69+
## Threads
70+
71+
Raft uses the following threads to handle operations. The name of the thread is in bold,
72+
and a short description of the operation handled by the thread follows. The main thread is
73+
responsible for handling many operations.
74+
75+
* **run** (main thread) - different behaviour based on peer state
76+
* follower
77+
* processRPC (from rpcCh)
78+
* AppendEntries
79+
* RequestVote
80+
* InstallSnapshot
81+
* TimeoutNow
82+
* liveBootstrap (from bootstrapCh)
83+
* periodic heartbeatTimer (HeartbeatTimeout)
84+
* candidate - starts an election for itself when called
85+
* processRPC (from rpcCh) - same as follower
86+
* acceptVote (from askPeerForVote)
87+
* leader - first starts replication to all peers, and applies a Noop log to ensure the new leader has committed up to the commit index
88+
* processRPC (from rpcCh) - same as follower, however we don’t actually expect to receive any RPCs other than a RequestVote
89+
* leadershipTransfer (from leadershipTransferCh) -
90+
* commit (from commitCh) -
91+
* verifyLeader (from verifyCh) -
92+
* user restore snapshot (from userRestoreCh) -
93+
* changeConfig (from configurationChangeCh) -
94+
* dispatchLogs (from applyCh) - handle client Raft.Apply requests by persisting logs to disk, and notifying replication goroutines to replicate the new logs
95+
* checkLease (periodically LeaseTimeout) -
96+
* **runFSM** - has exclusive access to the FSM, all reads and writes must send a message to this thread. Commands:
97+
* apply logs to the FSM, from the fsmMutateCh, from processLogs, from leaderLoop (leader) or appendEntries RPC (follower/candidate)
98+
* restore a snapshot to the FSM, from the fsmMutateCh, from restoreUserSnapshot (leader) or installSnapshot RPC (follower/candidate)
99+
* capture snapshot, from fsmSnapshotCh, from takeSnapshot (runSnapshot thread)
100+
* **runSnapshot** - handles the slower part of taking a snapshot. From a pointer captured by the FSM.Snapshot operation, this thread persists the snapshot by calling FSMSnapshot.Persist. Also calls compactLogs to delete old logs.
101+
* periodically (SnapshotInterval) takeSnapshot for log compaction
102+
* user snapshot, from userSnapshotCh, takeSnapshot to return to the user
103+
* **askPeerForVote (candidate only)** - short lived goroutine that synchronously sends a RequestVote RPC to all voting peers, and waits for the response. One goroutine per voting peer.
104+
* **replicate (leader only)** - long running goroutine that synchronously sends log entry AppendEntry RPCs to all peers. Also starts the heartbeat thread, and possibly the pipelineDecode thread. Runs sendLatestSnapshot when AppendEntry fails.
105+
* **heartbeat (leader only)** - long running goroutine that synchronously sends heartbeat AppendEntry RPCs to all peers.
106+
* **pipelineDecode (leader only)**

docs/apply.md

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
# Raft Apply
2+
3+
Apply is the primary operation provided by raft.
4+
5+
6+
This sequence diagram shows the steps involved in a `raft.Apply` operation. Each box
7+
across the top is a separate thread. The name in the box identifies the state of the peer
8+
(leader or follower) and the thread (`<peer state>:<thread name>`). When there are
9+
multiple copies of the thread, it is indicated with `(each peer)`.
10+
11+
```mermaid
12+
sequenceDiagram
13+
autonumber
14+
15+
participant client
16+
participant leadermain as leader:main
17+
participant leaderfsm as leader:fsm
18+
participant leaderreplicate as leader:replicate (each peer)
19+
participant followermain as follower:main (each peer)
20+
participant followerfsm as follower:fsm (each peer)
21+
22+
client-)leadermain: applyCh to dispatchLogs
23+
leadermain->>leadermain: store logs to disk
24+
25+
leadermain-)leaderreplicate: triggerCh
26+
leaderreplicate-->>followermain: AppendEntries RPC
27+
28+
followermain->>followermain: store logs to disk
29+
30+
opt leader commit index is ahead of peer commit index
31+
followermain-)followerfsm: fsmMutateCh <br>apply committed logs
32+
followerfsm->>followerfsm: fsm.Apply
33+
end
34+
35+
followermain-->>leaderreplicate: respond success=true
36+
leaderreplicate->>leaderreplicate: update commitment
37+
38+
opt quorum commit index has increased
39+
leaderreplicate-)leadermain: commitCh
40+
leadermain-)leaderfsm: fsmMutateCh
41+
leaderfsm->>leaderfsm: fsm.Apply
42+
leaderfsm-)client: future.respond
43+
end
44+
45+
```

0 commit comments

Comments
 (0)