A Solver for a Theory of Strings and Bit-vectors

Abstract

We present a solver for a many-sorted first-order quantifier-free theory Tw,bv of string equations, string length represented as bit-vectors, and bit-vector arithmetic aimed at formal verification, automated testing, and security analysis of C/C++ applications. Our key motivation for building such a solver is the observation that existing string solvers are not efficient at modeling the string/bit-vector combination. Current approaches either reduce strings to bit-vectors and use a bit-vector solver as a backend, or model bit-vectors as natural numbers and use a solver for the combined theory of strings and natural numbers. Both these approaches are inefficient for different reasons. Modeling strings as bit-vectors destroys structure inherent in string equations thus missing opportunities for efficiently deciding such formulas, and modeling bit-vectors as natural numbers is known to be inefficient. Hence, there is a clear need for a solver that models strings and bit-vectors natively. Our solver Z3strBV is a decision procedure for the theory Tw,bv combining solvers for bit-vector and string equations. We demonstrate experimentally that Z3strBV is significantly more efficient than reduction of string/bit-vector constraints to strings/natural numbers. Additionally, we prove decidability for the theory Tw,bv. We also propose two optimizations which can be adapted to other contexts. The first accelerates convergence on a consistent assignment of string lengths, and the second, dubbed library-aware SMT solving, fixes summaries for built-in string functions (e.g., strlen in C/C++), which Z3strBV uses directly instead of analyzing the functions from scratch each time. Finally, we demonstrate experimentally that Z3strBV is able to detect nontrivial overflows in real-world system-level code, as confirmed against 7 security vulnerabilities from CVE and Mozilla database.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…