Abstract:
|
LAV is a system for statically verifying program assertions and locating
bugs such as buffer overflows, pointer errors and division by zero. LAV is primarily
aimed at analyzing programs written in the programming language C. Since LAV
uses the popular LLVM intermediate code representation, it can also analyze programs
written in other procedural languages. Also, the proposed approach can be
used with any other similar intermediate low level code representation.
System combines symbolic execution, SAT encoding of program’s control-flow,
and elements of bounded model checking. LAV represents the program meaning
using first-order logic (FOL) formulas and generates final verification conditions as
FOL formulas. Each block of the code (blocks have no internal branchings and
no loops) is represented by a FOL formula obtained through symbolic execution.
Symbolic execution, however, is not performed between different blocks. Instead,
relationships between blocks are modeled by propositional variables encoding transitions
between blocks. LAV constructs formulas that encode block semantics once
for each block. Then, it combines these formulas with propositional formulas encoding
the transitions between the blocks. The resulting compound FOL formulas
describe correctness and incorrectness conditions of individual instructions. These
formulas are checked by an SMT solver which covers suitable combination of theories.
Theories that can be used for modeling correctness conditions are: theory of
linear arithmetic, theory of bit-vectors, theory of uninterpreted functions, and theory
of arrays. Based on the results obtained from the solver, the analyzed command
may be given the status safe (the command does not lead to an error), flawed (the
command always leads to an error), unsafe (the command may lead to an error) or
unreachable (the command will never be executed). If a command cannot be proved
to be safe, LAV translates a potential counterexample from the solver into a program
trace that exhibits this error. It also extracts the values of relevant program
variables along this trace.
The proposed system is implemented in the programming language Ñ++, as a
publicly available and open source tool named LAV. LAV has support for several
SMT solvers (Boolector, MathSAT, Yices, and Z3). Experimental evaluation on a
corpus of C programs, which are designed to demonstrate areas of strengths and
weaknesses of different verification techniques, suggests that LAV is competitive
with related tools. Also, experimental results show a big advantage of the proposed
system compared to symbolic execution applied to programs containing a big number
of possible execution paths. The proposed approach allows determining status of
commands in programs which are beyond the scope of analysis that can be done by
symbolic execution tools.
LAV is successfully applied in educational context where it was used for finding
bugs in programs written by students at introductory programming course. This
application showed that in these programs there is a large number of bugs that a
verification tool can efficiently find. Experimental results on a corpus of students’
programs showed that LAV can find bugs that cannot be found by commonly used
automated testing techniques. Also, it is shown that LAV can improve evaluation
of students’s assignments: (i) by providing useful and helpful feedback to students,
which is important in the learning process, and (ii) by improving automated grading
process, which is especially important to teachers. |