Modern large-scale service systems are usually deployed with redundant components to ensure high dependability in distributed and volatile environments. Fault Injection Testing (FIT) is a popular technique for testing such systems, while the application of FIT to validating the correctness of redundant components remains a challenging task, especially when the system’s structural information is unavailable when testing starts. In this study, we refer to a minimum set of faults that, when injected, will cut off all execution paths in a service system as a fault-tolerance bottleneck, and we propose a novel Fault-tolerance Bottleneck driven Fault Injection (FBFI) approach to the exploration and validation of redundant components without prior knowledge of the system’s business structure. The core idea of FBFI is to iteratively infer and inject bottlenecks of the business structure constructed so far. In this way, FBFI is able to discover and test redundant components by repeatedly triggering new system behaviors. The effectiveness and efficiency of FBFI is evaluated using two microservice benchmark systems with different deployment scales. The results reveal that FBFI is more practical and cost-effective than random and lineage-driven FIT approaches in testing service systems of high redundancy levels.
TOSEM
Toward More Efficient Statistical Debugging with Abstraction Refinement
Zhiqiang Zuo, Xintao Niu, Siyi Zhang, and 5 more authors
Debugging is known to be a notoriously painstaking and time-consuming task. As one major family of automated debugging, statistical debugging approaches have been well investigated over the past decade, which collect failing and passing executions and apply statistical techniques to identify discriminative elements as potential bug causes. Most of the existing approaches instrument the entire program to produce execution profiles for debugging, thus incurring hefty instrumentation and analysis cost. However, as in fact a major part of the program code is error-free, full-scale program instrumentation is wasteful and unnecessary. This article presents a systematic abstraction refinement-based pruning technique for statistical debugging. Our technique only needs to instrument and analyze the code partially. While guided by a mathematically rigorous analysis, our technique is guaranteed to produce the same debugging results as an exhaustive analysis in deterministic settings. With the help of the effective and safe pruning, our technique greatly saves the cost of failure diagnosis without sacrificing any debugging capability. We apply this technique to two different statistical debugging scenarios: in-house and production-run statistical debugging. The comprehensive evaluations validate that our technique can significantly improve the efficiency of statistical debugging in both scenarios, while without jeopardizing the debugging capability.
2022
ICSE
Combinatorial Testing of RESTful APIs
Huayao Wu, Lixin Xu, Xintao Niu, and 1 more author
In 2022 IEEE/ACM 44rd International Conference on Software Engineering, Mar 2022
This paper presents RestCT, a systematic and fully automatic approach that adopts Combinatorial Testing (CT) to test RESTful APIs. RestCT is systematic in that it covers and tests not only the interactions of a certain number of operations in RESTful APIs, but also the interactions of particular input-parameters in every single operation. This is realised by a novel two-phase test case generation approach, which first generates a constrained sequence covering array to determine the execution orders of available operations, and then applies an adaptive strategy to generate and refine several constrained covering arrays to concretise input-parameters of each operation. RestCT is also automatic in that its application relies on only a given Swagger specification of RESTful APIs. The creation of CT test models (especially, the inferring of dependency relationships in both operations and input-parameters), and the generation and execution of test cases are performed without any human intervention. Experimental results on 11 real-world RESTful APIs demonstrate the effectiveness and efficiency of RestCT. In particular, RestCT can find eight new bugs, where only one of them can be triggered by the state-of-the-art testing tool of RESTful APIs.
TSE
A theory of pending schemas in combinatorial testing
Combinatorial Testing (CT) is an effective testing technique for detecting failures which are triggered by the interactions of various factors that influence the behaviour of a system. Although many studies in CT have designed elaborate test suites (called covering arrays) to systemically check each possible factor interaction, they provide weak support to locate the concrete failure-inducing interactions, i.e., the Minimal Failure-causing Schemas (MFS). To this end, a variety of MFS identification approaches have been proposed. However, as this study reveals, these approaches suffer from various issues such as cannot identify multiple overlapping MFSs, cannot handle MFSs with high degrees, cannot be applied to systems with large number of parameters, etc. These issues are essentially caused by the exponential computing complexity of checking every interaction in the test cases. Therefore, they can only focus on a subset of all the possible interactions, resulting in many interactions unnoticed. Ignoring these unnoticed interactions could potentially cause failures that have never been systematically checked. Hence, it is beneficial for MFS identification approaches to identify these interactions. In order to account for these unnoticed interactions in CT, this study introduces the notion of pending schema, based on which a theoretical framework of CT schemas is established. In particular, we formally define the determinability of a schema in CT with respect to given information; as such, the yet-to-be determined schemas are exactly the pending schemas. The relationships between the different schemas (faulty, healthy, and pending) and test cases are also theoretically analyzed. Based on which, we further propose three formulas, along with three corresponding algorithms, for the identification of the pending schemas in failing test cases, and formally prove their correctness. As a result, we reduce the complexity of obtaining pending schemas with respect to the number of factors that may have influences on the software.
IST
An Adaptive Penalty based Parallel Tabu Search for Constrained Covering Array Generation
Yan Wang, Huayao Wu, Xintao Niu, and 2 more authors
Due to the effectiveness and efficiency in detecting defects caused by interactions of multiple factors, Combinatorial Testing (CT) has received considerable scholarly attention in the last decades. Despite numerous practical test case generation techniques being developed, there remains a paucity of studies addressing the automated oracle generation problem, which holds back the overall automation of CT. As a consequence, much human intervention is inevitable, which is time-consuming and error-prone. This costly manual task also restricts the application of higher testing strength, inhibiting the full exploitation of CT in industrial practice. To bridge the gap between test designs and fully automated test flows, and to extend the applicability of CT, this paper presents a novel CT methodology, named COMER, to enhance the traditional CT by accounting for Metamorphic Relations (MRs). COMER puts a high priority on generating pairs of test cases which match the input rules of MRs, i.e., the Metamorphic Group (MG), such that the correctness can be automatically determined by verifying whether the outputs of these test cases violate their MRs. As a result, COMER can not only satisfy the t-way coverage as what CT does, but also automatically check as many test oracle violations as possible. Several empirical studies conducted on 31 real-world software projects have shown that COMER increased the number of metamorphic groups by an average factor of 75.9 and also increased the failure detection rate by an average factor of 11.3, when compared with CT, while the overall number of test cases generated by COMER barely increased.
2021
ICSE
Identifying Key Features from App User Reviews
Huayao Wu, Wenjun Deng, Xintao Niu, and 1 more author
In 2021 IEEE/ACM 43rd International Conference on Software Engineering, Mar 2021
Due to the rapid growth and strong competition of mobile application (app) market, app developers should not only offer users with attractive new features, but also carefully maintain and improve existing features based on users’ feedbacks. User reviews indicate a rich source of information to plan such feature maintenance activities, and it could be of great benefit for developers to evaluate and magnify the contribution of specific features to the overall success of their apps. In this study, we refer to the features that are highly correlated to app ratings as key features, and we present KEFE, a novel approach that leverages app description and user reviews to identify key features of a given app. The application of KEFE especially relies on natural language processing, deep machine learning classifier, and regression analysis technique, which involves three main steps: 1) extracting feature-describing phrases from app description; 2) matching each app feature with its relevant user reviews; and 3) building a regression model to identify features that have significant relationships with app ratings. To train and evaluate KEFE, we collect 200 app descriptions and 1,108,148 user reviews from Chinese Apple App Store. Experimental results demonstrate the effectiveness of KEFE in feature extraction, where an average F-measure of 78.13% is achieved. The key features identified are also likely to provide hints for successful app releases, as for the releases that receive higher app ratings, 70% of features improvements are related to key features.
2020
TSE
Identifying Failure-Causing Schemas in the Presence of Multiple Faults
Combinatorial testing (CT) has been proven effective in revealing the failures caused by the interaction of factors that affect the behavior of a system. The theory of Minimal Failure-Causing Schema (MFS) has been proposed to isolate the cause of a failure after CT. Most algorithms that aim to identify MFS focus on handling a single fault in the System Under Test (SUT). However, we argue that multiple faults are more common in practice, under which masking effects may be triggered so that some failures cannot be observed. The traditional MFS theory lacks a mechanism to handle such effects; hence, they may incorrectly isolate the MFS. To address this problem, we propose a new MFS model that takes into account multiple faults. We first formally analyze the impact of the multiple faults on existing MFS identifying algorithms, especially in situations where masking effects are triggered by multiple faults. We then develop an approach that can assist traditional algorithms to better handle multiple faults. Empirical studies were conducted using several kinds of open-source software, which showed that multiple faults with masking effects do negatively affect traditional MFS identifying approaches and that our approach can help to alleviate these effects.
TSE
An Interleaving Approach to Combinatorial Testing and Failure-Inducing Interaction Identification
Xintao Niu, Changhai Nie, Hareton Leung, and 4 more authors
IEEE Transactions on Software Engineering, Feb 2020
Combinatorial testing (CT) seeks to detect potential faults caused by various interactions of factors that can influence the software systems. When applying CT, it is a common practice to first generate a set of test cases to cover each possible interaction and then to identify the failure-inducing interaction after a failure is detected. Although this conventional procedure is simple and forthright, we conjecture that it is not the ideal choice in practice. This is because 1) testers desire to identify the root cause of failures before all the needed test cases are generated and executed 2) the early identified failure-inducing interactions can guide the remaining test case generation so that many unnecessary and invalid test cases can be avoided. For these reasons, we propose a novel CT framework that allows both generation and identification process to interact with each other. As a result, both generation and identification stages will be done more effectively and efficiently. We conducted a series of empirical studies on several open-source software, the results of which show that our framework can identify the failure-inducing interactions more quickly than traditional approaches while requiring fewer test cases.
Combinatorial testing (CT) aims at detecting interaction failures between parameters in a system. Identifying the failure-inducing combinations of a failing test configuration can help developers find the cause of this failure. However, most studies in CT focus on detecting the failures rather than identifying failure-inducing combinations. In this paper, we propose the notion of a tuple relationship tree (TRT) to describe the relationships among all the candidate parameter interactions. TRT reduces additional test configurations that need to be generated in the fault localization process, and it also provides a clear view of all possible candidate interactions. As a result, our approach will not omit any possible interaction that could be the cause of a failure. In particular, we can identify multiple failure-inducing combinations that overlap with each other. Moreover, we extend our approach to handle the case where additional failure-inducing combinations may be introduced by newly generated test configurations.