mit6.824: A Go repository from TaurusGGBOY

仅展示部分文档

Assignment1

问题探索日志

问题1

[root@localhost main]# sh test-mr.sh
*** Starting wc test.
2021/04/19 06:29:22 dialing:dial unix /var/tmp/824-mr-0: connect: connection refused
2021/04/19 06:29:22 dialing:dial unix /var/tmp/824-mr-0: connect: connection refused
2021/04/19 06:29:22 dialing:dial unix /var/tmp/824-mr-0: connect: connection refused
2021/04/19 06:29:22 rpc.Register: method "Done" has 1 input parameters; needs exactly three
sort: cannot read: mr-out*: No such file or directory
cmp: EOF on mr-wc-all
--- wc output is not the same as mr-correct-wc.txt
--- wc test: FAIL

探索路径

google了，无
查看一下脚本

echo '***' Starting wc test.

timeout -k 2s 180s ../mrcoordinator ../pg*txt &
pid=$!

# give the coordinator time to create the sockets.
sleep 1

# start multiple workers.
timeout -k 2s 180s ../mrworker ../../mrapps/wc.so &
timeout -k 2s 180s ../mrworker ../../mrapps/wc.so &
timeout -k 2s 180s ../mrworker ../../mrapps/wc.so &

开局开启coordinator，1秒之后开启worker，改成5s试试

原因

开局让coordinator承担了split的任务，导致sleep一秒之后，coordinator还没有启动……

cost time

20h

结局

Assignment2A

1 Require

implement selection and heart beat

2 TODO

3 Tips

Read paper's Figure 2 about election
Heartbeat no more than 0.1s
Selection finishes in 5s
Heartbeat may larger than 150ms-300ms
Use time.Sleep()
RPC only capital letters
go test -run 2A -race
set selection timeout 400ms

4 Problems and Solve

can't rpc RequestVote function
there sometimes are 3 leaders at one time
- consider one is leader and vote for another one, and he become leader and not send heartbeat yet
- there is no timeout for a selection
- solution: old leader received old vote from others
  - received vote then judge if it's legal
election need timetout scheme
- first, add timetout scheme, when it comes to 150ms-300ms, judge if win
- second, add waitgroup scheme, if get all vote reply, then judge if win
selection reply need to judge if it's from currentterm
leader win the selection, but leader term is less than a candidate
- so what to do for this follower?
- leader should update term when vote and heartbeat

5 Result

6 Cost time

20-30h

2B

1 Require

implement append log

2 TODO

Define a struct to hold information about each log entry in Figure 2

3 Tips

figure2

4 Problems and Solve

5 Result

6 Cost time

60-70h

7improve points

Once follower is down, leader will try appendentry rpc infinitely and won't stop.
One time can append a piece of command
logs delete if not match
first reboot of leader may re transfer all log to others
once follower down, selection will hold if follower recovers. It costs a lot.

2C

1 Require

implement persist

2 TODO

Define a struct to hold information about each log entry in Figure 2

3 Tips

figure8

4 Problems and Solve

5 Result

6 Cost time

30-40h

7improve points

Once follower is down, leader will try appendentry rpc infinitely and won't stop.
in recover phase, frequently change leader may cause slow consistence
there is still 1/3 test will go wrong
- 2021/08/05 09:22:18 apply error: commit index=129 server=2 5918 != server=1 4089
  - reorder rpc can cause commit mess
- config.go:552: one(9909) failed to reach agreement
  - lose rpc and reorder can cause slow convergence

TaurusGGBOY/mit6.824

Assignment1

问题探索日志

问题1

cost time

结局

Assignment2A

1 Require

2 TODO

3 Tips

4 Problems and Solve

5 Result

6 Cost time

2B

1 Require

2 TODO

3 Tips

4 Problems and Solve

5 Result

6 Cost time

7improve points

2C

1 Require

2 TODO

3 Tips

4 Problems and Solve

5 Result

6 Cost time

7improve points

2D

1 Require

2 TODO

3 Tips

4 Problems and Solve

5 Result

6 Cost time

7improve points