Annotation

A SEMANTICALLY ORIENTED METHOD FOR GENERATING TEST INPUTS FOR MUTATION-BASED TESTING OF SYNTACTIC ANALYZERS FOR STRUCTURED FORMATS
Скачать PDF
Annotation: The paper considers a semantically oriented method for forming a targeted set of test inputs for mutation-based testing (fuzz testing) of syntactic analyzers (parsers) for structured formats. Mutation-based testing is understood as automated generation and systematic modification of input data in order to provoke failures (crashes) and выявлять defects in the program under test. A syntactic analyzer (parser) is treated as a component that performs parsing of an input sequence according to the rules of a formal grammar and constructs a structured representation of the input. The approach is based on a formal representation of the analyzed program as an annotated control flow graph, where each conditional statement is associated with a logical predicate. These predicates are solved using SAT/SMT solvers, which makes it possible to generate input data targeted at program branches that are rarely reached. This mechanism increases the probability of revealing errors related to boundary conditions, deep nesting, and complex logical dependencies. The method includes a quantitative evaluation of branch coverage and residual risk using the Good–Turing method, providing a formalized criterion for the completion of a test campaign. The practical applicability of the approach is demonstrated on a set of parsers (cJSON, RapidJSON, tinyxml2, yaml-cpp) using the American Fuzzy Lop++ (AFL++) tooling. Under identical compilation conditions and instrumentation parameters, a stable increase in the share of covered branches and in the number of unique transitions by 8–10% was observed compared to the baseline configuration, along with faster reaching of rare branches. For additional programs processing structured data, the increase was about 11–13%, which confirms the transferability of the method. Run-to-run variability, the impact of input complexity, and solver limitations were taken into account, which improves the reliability of the conclusions. It is shown that the residual risk estimate quantitatively describes the probability of discovering new branches at later stages of testing. In conclusion, it is substantiated that incorporating semantic information about the program structure when forming the test set increases the effectiveness of fuzz testing and is recommended for parsers of deeply nested and grammatically rich formats.
Page numbers: 58-69.
For citation: Semenov S.A. A semantically oriented method for generating test inputs for mutation-based testing of syntactic analyzers for structured formats // Electronic Scientific Journal IT-Standard. – 2025. – No. 4. – pp. 58-69.