[llvmlinux] [GSoC] Integrating the Clang static analyzer: first (rough) proposal draft

Eduard Bachmakov e.bachmakov at gmail.com
Fri May 3 05:05:23 UTC 2013

Thanks for your input, JS! Here's v2. What exactly did you mean by
"sparse is a start (replicate functionality, extend functionality)"

(If the table is hard to read for anyone, here's a rendered version:

Why bother?
Static analysis detects semantic errors which are usually hard to find: either
some test fails or the issue appears under specific conditions no test was
designed for. Or, even worse, there is no crash but a subtle issue like
garbage values. Implementing static analysis allows skipping much of the
debugging involved and fix the issue right away or at least think through why
the analyzer would return a false-positive, saving time and sleep.

Especially in the case of the Linux kernel correctness (or at least
predictability) is important. Running on millions of devices of all shapes and
sizes (think TOP500 to Android) and having such a rapid pace of development, a
method for dealing with those 20% of the code that take 80% of the time would
be invaluable and have a significant impact on the quality of code released.

clang-analyzer (checker) is one such static analyzer and fits nicely within
the llvmlinux project.

In order to get a pleasurable developer experience, multiple steps are

* Integrating checker into the build system

 * Simple checks are already easy to do by stetting $C and $CHECK variables.
   However, most of the time more context than offending line is necessary
   (e.g. null pointer dereference), which is why `scan-build` provides much of
   the necessary context.
 * Using `scan-build` within the build system is non-trivial. Integration of a
   target, e.g. `make analysis` would be the first goal.

* Integration with buildbot

 * Instead of capturing simple stdio, the idea is to extend the buildbot
   associate each build with the relevant analysis report. This way would e.g.
   allow interested kernel developer who do not want to go through the trouble
   of setting up their own build system see any/all issues with their code
   (per target).

* Create aggregate statistics tool

 * What goes wrong most?
 * Who's code is breaking (... checker)?
 * etc.

* Choosing relevant existing checks

 * checker already has a sizable list of available checks
   http://clang-analyzer.llvm.org/available_checks.html , and not all are
   relevant for linux. The goal is to find a reasonable default

* Modifying existing checks

 * Some of the checks don't necessarily work as intended. "Undefined or
   garbage value returned to caller" is distracting if the variable was
   created using a macro that explicitly states so.

* Add new checks

 * The existing checks are by no means exhaustive. With a project as big as
   linux, there should be plenty of bugs available to derive new checks from.

============ ====================================================
 Month-Week   Description
============ ====================================================
 Jun-1,2:     Familiarize myself with llvmlinux build system,
              design of `scan-build`, buildbot, general other
 Jun-3,4:     Integrate `scan-build` into build system as either
              first-class citizen or completely transparently
 Jun-5:       Integrate checking functionality into buildbot
 Jul-1,2:     Implement summaries  into buildbot and (optionally)
              the build system.
 Jul-3,4:     Analyze the results of checker, determine which
              are legitimate and which are false-postives and/or
              unapplicable, disable the latter.
 Aug-2:       Investigate the former. Report (tons of?) bugs.
 Aug-3,4,5:   Fix checks that are fixable. Depending on
              circumstance, design system to dynamically enable
              or disable checks. Jump to next point if done
 Sep-1,2:     Implement new checks for kernel/assembly related
 Sep-3:       Buffer week (polish, additional documentation, etc)
============ ====================================================

More information about the LLVMLinux mailing list