[llvmlinux] [GSoC] Integrating the Clang static analyzer: first (rough) proposal draft

Jan-Simon Möller dl9pf at gmx.de
Fri May 3 07:16:24 UTC 2013


Hi all!

Now we need it in the format for the GSoC application. The form fields in 
google melange are:

* Proposal title            
    -  We need a cool title of course
* Short description     
    - Short abstract, summary, or snippet; 500 characters or less, plain text
      displayed publicly
    - Most important to catch attention
* Content
    - We can reuse:  
http://www.linuxfoundation.org/collaborate/workgroups/gsoc/google-summer-code-student-application-template

@Eduard: Make sure you're registered in google-melange already. So we can just 
submit stuff later today.  (After short discussion on ML/IRC/skype) .

Best,
JS

P.S.  I'll be offline between 5pm and 9pm tonight, so let's get this done 
before.


On Friday 03 May 2013 01:05:23 Eduard Bachmakov wrote:
> Thanks for your input, JS! Here's v2. What exactly did you mean by
> "sparse is a start (replicate functionality, extend functionality)"
> though?
> 
> (If the table is hard to read for anyone, here's a rendered version:
> http://rst.ninjs.org/?n=ac32bbcc57e005c3d2dd63f8435c07bc&theme=nature)
> 
> Why bother?
> -----------
> Static analysis detects semantic errors which are usually hard to find:
> either some test fails or the issue appears under specific conditions no
> test was designed for. Or, even worse, there is no crash but a subtle issue
> like garbage values. Implementing static analysis allows skipping much of
> the debugging involved and fix the issue right away or at least think
> through why the analyzer would return a false-positive, saving time and
> sleep.
> 
> Especially in the case of the Linux kernel correctness (or at least
> predictability) is important. Running on millions of devices of all shapes
> and sizes (think TOP500 to Android) and having such a rapid pace of
> development, a method for dealing with those 20% of the code that take 80%
> of the time would be invaluable and have a significant impact on the
> quality of code released.
> 
> clang-analyzer (checker) is one such static analyzer and fits nicely within
> the llvmlinux project.
> 
> What?
> -----
> In order to get a pleasurable developer experience, multiple steps are
> necessary:
> 
> * Integrating checker into the build system
> 
>  * Simple checks are already easy to do by stetting $C and $CHECK variables.
> However, most of the time more context than offending line is necessary
> (e.g. null pointer dereference), which is why `scan-build` provides much of
> the necessary context.
>  * Using `scan-build` within the build system is non-trivial. Integration of
> a target, e.g. `make analysis` would be the first goal.
> 
> * Integration with buildbot
> 
>  * Instead of capturing simple stdio, the idea is to extend the buildbot
>    associate each build with the relevant analysis report. This way would
> e.g. allow interested kernel developer who do not want to go through the
> trouble of setting up their own build system see any/all issues with their
> code (per target).
> 
> * Create aggregate statistics tool
> 
>  * What goes wrong most?
>  * Who's code is breaking (... checker)?
>  * etc.
> 
> * Choosing relevant existing checks
> 
>  * checker already has a sizable list of available checks
>    http://clang-analyzer.llvm.org/available_checks.html , and not all are
>    relevant for linux. The goal is to find a reasonable default
> 
> * Modifying existing checks
> 
>  * Some of the checks don't necessarily work as intended. "Undefined or
>    garbage value returned to caller" is distracting if the variable was
>    created using a macro that explicitly states so.
> 
> * Add new checks
> 
>  * The existing checks are by no means exhaustive. With a project as big as
>    linux, there should be plenty of bugs available to derive new checks
> from.
> 
> 
> Roadmap
> -------
> ============ ====================================================
>  Month-Week   Description
> ============ ====================================================
>  Jun-1,2:     Familiarize myself with llvmlinux build system,
>               design of `scan-build`, buildbot, general other
>               documentation.
>  Jun-3,4:     Integrate `scan-build` into build system as either
>               first-class citizen or completely transparently
>  Jun-5:       Integrate checking functionality into buildbot
>  Jul-1,2:     Implement summaries  into buildbot and (optionally)
>               the build system.
>  Jul-3,4:     Analyze the results of checker, determine which
>               are legitimate and which are false-postives and/or
>               unapplicable, disable the latter.
>  Midterm
>  Aug-2:       Investigate the former. Report (tons of?) bugs.
>  Aug-3,4,5:   Fix checks that are fixable. Depending on
>               circumstance, design system to dynamically enable
>               or disable checks. Jump to next point if done
>               early.
>  Sep-1,2:     Implement new checks for kernel/assembly related
>               issues.
>  Sep-3:       Buffer week (polish, additional documentation, etc)
>  End
> ============ ====================================================
-- 

Dipl.-Ing.
Jan-Simon Möller

jansimon.moeller at gmx.de


More information about the LLVMLinux mailing list