/CrashTuner

CrashTuner(SOSP2019)

Primary LanguageShell

CrashTuner

What is CrashTuner

CrashTuner is fault injection framework for distributed system.

What can you get from this project?

We provide two things here:

  1. We list all old bugs(include the bugs in k8s) and new bugs, we give the detail of each bug.
  2. we provide the artifact of CrashTuner(only jar files), see document. We give the detail of how to trigger each bug.

Old Bugs

All our studied bugs and their detail exists in old bugs, we also give the detail of our study about k8s, but is written in Chinese.

New bugs

In the below table, we give all new bugs found by CrashTuner. You can click the Bug Id to see the bug report and Patch to see the fixing, and Detail to see how to trigger the bug. In the detail, we only show a small code snippet, you can download the whole buggy project code(which can be found in bug report) for further understanding.

Some bugs are marked as "Duplicate" in its issue because they are fixed together with other issue, as the developer required.

Bug Id Priority Status Patch Detail Meta-info
YARN-9164 Critical Fixed YARN-9164-2.patch YARN-9164 NodeId
YARN-9165 Critical Fixed YARN-9164-2.patch YARN-9165 ContainerId
YARN-9193 Critical Fixed YARN-9194_6.patch YARN-9193 ContainerId
YARN-9238 Critical Fixed YARN-9238_3.patch YARN-9238 ApplicationId
YARN-9201 Critical Fixed YARN-9194_6.patch YARN-9201 ContainerId
YARN-8649 Critical Fixed YARN-8649_5.patch YARN-8649 AppAttemptId
HBASE-22017 Critical Fixed pull-158 HBASE-22017 ServerName
HBASE-22041 Critical Unresolved - HBASE-22041 ServerName
HDFS-14216 Major Fixed HDFS-14216_6.patch HDFS-14216 DataNodeInfo
HDFS-14372 Major Fixed HDFS-14372_2.patch HDFS-14372 BPOfferService
HBASE-22050 Major Unresolved HBASE-22050.patch HBASE-22050 RegionInfo
YARN-9248 Major Fixed YARN-9248_5.patch YARN-9248 ContainerId
YARN-9194 Major Fixed YARN-9194_6.patch YARN-9194 ApplicationId
MR-7178 Major Unresolved MR-7178_1.patch MR-7178 TaskAttemptId
HBASE-21740 Major Fixed HBASE-21740.patch HBASE-21740 MetricsRegionServer
HBASE-22023 Trivial Unresolved master.patch HBASE-22023 MetricsRegionServer
YARN-8650 Major Fixed YARN-8331.002.patch YARN-8650 ContainerId
CASSANDRA-15131 Normal Unresolved PULL-322 CASSANDRA-15131 InetAddressAndPort

Reproduce

There are two ways to reproduce the bugs found by CrashTuner.

  1. We have written some unit tests in patches, you can change the source code to re-test them.
  2. We have provide a docker image to reproduce all new bugs, see document.
  3. We are working on reproducing old bugs and will add into git later.

Portability

Currently, we only apply CrashTuner on the distributed system that written in java, but we also investigate the distributed system written by other language, like K8s written in Golang. We find that CrashTuner can also help it improve the reliability. our studied bugs in k8s are in k8sbugs(sorry about the document is written in Chinese, English reader can click the URL in each document goto the corresponding issue) and we are implementing another version CrashTuner to detect them.

We believe that CrashTuner has good portability and we are doing large work on different distributed systems.

Others

How to determine the bug Priority?

JIRA has 5 level Priority: Blocker, Critical, Major(Cassandra is Normal), Minor and Trivial

When we create a bug issue, JIRA will assign a default Priority as "Major". If the origin developers think the bug has more serious affection, they will change the Priority as Critical or Blocker, like YARN-9194. Of course, some bugs' affections are not serious as we think, the origin developers will change their Priority as Minor or Trivial, like HBASE-22023.