EBugs in Cloud Systems

This repository contains a dataset of 210 exception-related bugs, also denoted as eBugs, curated from six widely deployed cloud systems, including Cassandra, HBase, HDFS, Hadoop MapReduce, YARN, and ZooKeeper. The repository also contains a list of inaccurate exceptions* detected by DIET, a static analysis tool that detects inaccurate exceptions by finding inconsistency between an exception’s class and its error message.

* Please refer to our paper (see below) for the definition of inaccurate exceptions and other types of eBugs.

The eBugs dataset

The eBugs dataset contains 210 exception-related bugs. These bugs are selected from more than 60,000 JIRA issues, which were submitted for the six studied cloud systems from 2007 to 2018. Every eBug is studied from multiple perspectives to answer the following key research questions:

Please check out our paper (see below) for detailed statistics of each perspective, as well as the findings and implications on them.

DIET - Detecting Inaccurate Exceptions using triggering condition Types

Based on the findings on the eBugs dataset, we design and implement a static analysis tool, called DIET, to detect an important class of eBug, i.e., inaccurate exception. From a high level perspective, DIET detects inaccurate exceptions by understanding an exception’s class and error message. If the class and the error message imply different types of triggering conditions, DIET will flag the exception as an inaccurate one.

Please refer to our paper for more details about how DIET works.

Inaccurate exceptions detected by DIET

We apply DIET to the latest versions of the studied systems, and find 31 inaccurate exceptions. Two of them are eBugs that cause failure symptoms. The remaining 29 inaccurate exceptions are bad practices, where their classes and error messages are indeed inconsistent. We submit them to the developers of these systems in the form of JIRA issues. We group the eBugs and bad practices that are in the same file into the same JIRA issue. At the time of writing, developers have confirmed the two eBugs and 21 bad practices. Moreover, all of the confirmed inaccurate exceptions are previously-unknown. Below is the list of all the confirmed eBugs and bad practices. We only list one instance of HDFS-14467 and one instance of HDFS-14468 because all the bugs in these two issues have the same exception class and error message. You can find more details about the detected eBugs and bad practices in both the dataset and our paper.

System Exception Class Error Message Triggering Condition Created Issue Issue Type
Hadoop (3.1.2) DiskErrorException “Invalid value configured for” semantic condition HDFS-14469 bug
Hadoop (3.1.2) IOException “replaceFile interrupted.” untimely interrupt HADOOP-16295 bug
Cassandra (3.11) RuntimeException “Interrupted while waiting for stream/fetch ranges to finish” untimely interrupt CASSANDRA-15111 bad practice
Cassandra (3.11) RuntimeException “Interrupted while waiting on rebuild streaming” untimely interrupt CASSANDRA-15112 bad practice
Cassandra (3.11) RuntimeException “Node interrupted while decommissioning” untimely interrupt CASSANDRA-15113 bad practice
Cassandra (3.11) RuntimeException “Not enough space to write … to … (… available)” out of resource CASSANDRA-15114 bad practice
Cassandra (3.11) RuntimeException “Not enough disk space to store …” out of resource CASSANDRA-15114 bad practice
Cassandra (3.11) RuntimeException “Failed to create failed snapshot tracking file […]. Aborting” file system error CASSANDRA-15115 bad practice
Cassandra (3.11) RuntimeException “Unable to create directory:” file system error CASSANDRA-15116 bad practice
Cassandra (3.11) RuntimeException “Unable to list directory:” file system error CASSANDRA-15117 bad practice
Hadoop (3.1.2) DiskErrorException “Invalid value configured for” semantic condition HDFS-14467 bad practice
Hadoop (3.1.2) DiskErrorException “Invalid value configured for” semantic condition HDFS-14468 bad practice
Hadoop (3.1.2) DiskErrorException “Invalid value configured for” semantic condition HDFS-14470 bad practice
Hadoop (3.1.2) IOException “Interrupted receiveBlock” untimely interrupt HDFS-14473 bad practice
Hadoop (3.1.2) RuntimeException “Cannot create directory” file system error HADOOP-16296 bad practice
Hadoop (3.1.2) RuntimeException “Specified work directory does not exists” file system error HADOOP-16297 bad practice
Hadoop (3.1.2) YarnException “Interrupted while retrying to connect to ATS” untimely interrupt YARN-9533 bad practice
Hadoop (3.1.2) YarnException “Interrupted while trying to connect ATS” untimely interrupt YARN-9534 bad practice

Publication

Understanding Exception-Related Bugs in Large-Scale Cloud Systems [preprint] [slides]
Haicheng Chen, Wensheng Dou, Yanyan Jiang, Feng Qin
34th IEEE/ACM International Conference on Automated Software Engineering (ASE 2019).