Manifest file for disk-based WAL implementation
dracoooooo opened this issue · 3 comments
Describe This Problem
To implement a WAL based on the local disk, in addition to using segment files to record logs, it is also necessary to use another file to record the metadata of the WAL. This is because the current WAL Delete
interface includes a tableId as a parameter, and logs for multiple tables are recorded in the same segment file. This means that it is not possible to simply mark all logs before a certain sequence number as deletable. Therefore, a manifast file is needed to maintain this information.
Proposal
Format
Using protobuf as the file format for WAL manifest:
syntax = "proto3";
message Manifest {
map<string, uint64> latest_mark_deleted = 1;
}
The key in the map is <regionId>:<tableId>
, and the value is the highest sequance number marked as deleted for this table in the WAL.
Not using the manifest file to record more information is to avoid updating this file during appending logs, thereby reducing I/O overhead.
Append Logs
Do not update the manifest file.
Read Logs
Use the manifest file to skip logs that have already been deleted.
Delete Logs
Update the values in the map, create a new manifest file, and overwrite the old file.
Record the maximum sequence number of all tables in each segment file in memory. When all the tables mark the deleted sequence number as greater than the maximum sequence number in the segment, delete this segment.
When an old segment is deleted, if a table’s log exists only in this old segment, then remove this table from the manifest’s map.
Potential Risks
- If the number of tables is very large, the overhead of overwriting this manifest file each time could be significant.
Additional Context
No response
The key in the map is
<regionId>:<tableId>
,
I think we can encode regionId in wal directory path, so the key could only contains tableId
.
Use the manifest file to skip logs that have already been deleted.
How will you skip WAL files? Which strategy will you use?
Update the values in the map, create a new manifest file, and overwrite the old file.
This is the normal case, what if there are some partials error, such as overwrite failed, you need to document more details, pseudo code or sequence diagram may help.
Record the maximum sequence number of all tables in each segment file in memory
How will you recovery this info when server start up? do we need to iterate the whole WAL files?
The key in the map is
<regionId>:<tableId>
,I think we can encode regionId in wal directory path, so the key could only contains
tableId
.
Indeed.
Use the manifest file to skip logs that have already been deleted.
How will you skip WAL files? Which strategy will you use?
This manifest exists both in the file system and in memory. In memory, it is represented as a map. Since we record the min and max sequence numbers of each table in segments in memory, we can skip the segments that are not needed. While iterating through the necessary segments, we might encounter logs that have already been deleted. In such cases, we can skip them based on the information in the map.
Update the values in the map, create a new manifest file, and overwrite the old file.
This is the normal case, what if there are some partials error, such as overwrite failed, you need to document more details, pseudo code or sequence diagram may help.
The general steps for overwriting are to acquire the write lock for the manifest, create a new temporary file, write to this temporary file, use fsync
to ensure the content has been written to the disk, and then use rename to replace the original file.
If an error occurs in the steps above, I don’t think it can be handled, and we would have to panic.
Record the maximum sequence number of all tables in each segment file in memory
How will you recovery this info when server start up? do we need to iterate the whole WAL files?
Yes. I think this is a trade-off to avoid writing manifest file during the WAL write operation.
After some discussion, the manifest isn't a must, and introduce more burden.
The idea is that if we can delete unused segments in time, then when server restarts, we can reconstruct table seq from segment one after one.