The Baidu File System (BFS) is a distributed file system designed to support real-time applications. Like many other distributed file systems, BFS is highly fault-tolerant. But different from others, BFS provides low read/write latency while maintaining high throughput rates. Together with Galaxy and Tera, BFS supports many real-time products in Baidu, including Baidu webpage database, Baidu incremental indexing system, Baidu user behavior analysis system, etc.
- Continuous availability
- Nameserver is implemented as a
raft group
, no single point failure.
- Nameserver is implemented as a
- High throughput
- High performance data engine to maximize IO utils.
- Low latency
- Global load balance and slow node detection.
- Linear scalability
- Support multi data center deployment and up to 10,000 data nodes.
./build.sh
cd sandbox
./deploy.sh
./start.sh
- Please read the RoadMap or source code.
- Find something you are interested in and start working on it.
- Test your code by simply running
make test
andmake check
. - Make a pull request.
- Once your code has passed the code-review and merged, it will be run on thousands of servers :)
====
百度的核心业务和数据库系统都依赖分布式文件系统作为底层存储,文件系统的可用性和性能对上层搜索业务的稳定性与效果有着至关重要的影响。现有的分布式文件系统(如HDFS等)是为离线批处理设计的,无法在保证高吞吐的情况下做到低延迟和持续可用,所以我们从搜索的业务特点出发,设计了百度文件系统。
- 持续可用
- 数据多机房、多地域冗余,元数据通过Raft维护一致性,单个机房宕机,不影响整体可用性。
- 高吞吐
- 通过高性能的单机引擎,最大化存储介质IO吞吐;
- 低延时
- 全局负载均衡、慢节点自动规避
- 水平扩展
- 设计支持两地三机房,1万+台机器管理。
./build.sh
cd sandbox
./deploy.sh
./start.sh
- 阅读RoadMap文件或者源代码,了解我们当前的开发方向
- 找到自己感兴趣开发的的功能或模块
- 进行开发,开发完成后自测功能是否正确,并运行make test及make check检查是否可以通过已有的测试case
- 发起pull request
- 在code-review通过后,你的代码便有机会运行在百度的数万台服务器上~