[Ksummit-discuss] [CORE TOPIC] More useful types in the linux kernel

Julia Lawall julia.lawall at lip6.fr
Wed Jul 20 07:32:01 UTC 2016



On Tue, 19 Jul 2016, Eric W. Biederman wrote:

> James Bottomley <James.Bottomley at HansenPartnership.com> writes:
>
> > On Tue, 2016-07-19 at 10:32 -0500, Eric W. Biederman wrote:
> >> Historically the types in C came about because the machines
> >> fundamentally supported different data types either with different
> >> sizes or different characteristics (i.e. u8, u16, float, double).
> >> These data types and the C type system were built so programmers
> >> could tell the machine what they needed it to do.
> >>
> >> There is another genesis of types that started with the simply typed
> >> lambda calculs that is about eliminating common errors and otherwise
> >> helping a programmer get their code right.
> >>
> >> In the years since C was invented there has been a lot of activity
> >> and a
> >> little bit of progress in this area.  Would people be receptive to
> >> improvements in this area?
> >>
> >> I would like to talk to folks and gague what it would take to make
> >> improvements in this area acceptable, practical, and useful.
> >>
> >> Would a gcc plugin that checks the most interesting things that
> >> sparse checks on every build be interesting? (endianness of integer
> >> types for example)
> >
> > How would this be different from simply automatically running sparse in
> > the kernel build if the binary is present (effectively making make C=1
> > the default)?
>
> Nothing.  I am just honestly looking at ways that we can get things to
> always or almost always run.   Sparse isn't getting run regularly now so
> I was suspect that would not be as good of a solution.

A problem with putting time consuming tools into the normal build process
is that they work on all the code you compile, which may be a large
superset of the code that is actually changed.

julia


>
> >> Would a type system for pointers derived from separation logic that
> >> has the concept that a piece of data is owned by a piece of running
> >> code rather than another piece of data be interesting?
> >
> > By this you mean a thread of execution that should be expected to free
> > the data pointed to when it finishes?  Sort of like a self garbage
> > collecting reference?
>
> Not really.  The big idea is to make expressable in the type system the
> key concepts on how a programmer reasons about their data structures,
> instead of trying to make programmers perform some convoluted logic to
> express things in concepts that are trivial to implement in a type
> system.
>
> Another way to talk about it would be complete alias analysis at type
> checking time.
>
> Which means that the type system then knows essentially everything the
> programmer knows about aliases and object lifetimes and if you don't
> pass something on or free it, the type checker then knows you made
> a mistake.
>
> >>   * This cleanly allows for doubly linked lists.
> >>
> >>   * This is useful to ensure that data is either put in another data
> >>     structure where it is remembered or it is freed.
> >>
> >>   * This is useful to ensure reference counts are not leaked.
> >>
> >>   * This is useful to ensure that every lock is paired with an
> >> unlock.
> >>
> >> My personal filter for things like this are types that can be checked
> >> in time proportional to the amount of code to be built so that it is
> >> roughly the same situation we are in now.
> >>
> >>
> >> Given it's heritage and it's history the type system in C has serious
> >> limitations that I don't know if they are correctible, when it comes
> >> to catching programmer mistakes: silent truncation of unsigned types
> >> into smaller unsigned types, casts, etc.  Would people be willing to
> >> consider a simple, link compatible alternative to C for some of the
> >> code in the kernel that had the same low level control of the machine
> >> but had a type system that made catching mistakes easier?
> >>
> >>
> >> Deploying solutions like this will take a fair bit of grunt work, and
> >> time similar or worse than the big kernel lock removal.  Given how
> >> widely Linux is used and how annoying some of these bugs can be I
> >> think it is worthwhile to dig in and see what kind of improvements
> >> can be made.
> >>
> >> I would really like to get a feel among kernel maintainers and
> >> developers if this is something that is interesting, and what kind of
> >> constraints they think something like this would need to be usable
> >> for the kernel?
> >
> > I've got to say that rewriting in some as yet undefined language simply
> > to get better typing and conversions looks very daunting.  Before even
> > considering this, can you answer why an extension to sparse, which
> > would mostly flag the problems if we use the correct annotations,
> > wouldn't work just as well?  We'd still have to add the additional
> > annotations (and someone would have to update sparse) but it would
> > still be a lot easier than rewriting and giving the kernel a build
> > dependency on whatever the rewrite is done in.
>
> That is a very good question.  The short answer is that I am not yet
> convinced I can figure out how to retrofit the type checks I want onto
> C's type system.  I have implemented the type checks I am looking at in
> a fresh language I am toying with where I can throw out inconvinient
> unnecessary corner cases.  After I finished boiling the ocean the basic
> types were similar enough to C's type system that I think I can make
> the connection but I haven't tried yet.
>
> Assuming things can be sorted so they coexist nicely C's types.  No
> advantage.  If it turns out that to make the code not a pain to write we
> need better syntax or some basic type inference there is a large
> advantage in converting code.
>
> The first place I can think of where I might get hung up in the area of
> discriminated unions.  I am pretty certain that some cases of type
> safety will require them and C doesn't have anything like that today.  A
> practical example would be a type system that ensures you call
> IS_ERR(ptr) before you use a pointer as a pointer.
>
> Ordinary C unions and type casts that serve the same purpose would at the
> end of the day all have to be reabstracted.  Because otherwise memory
> safety cound not be analyized.
>
> Some of that reabstraction I will need to introduce type variables which
> a completely foreing concept to C.
>
> On the flip side since the linux kernel and low level programming where
> precise control of the machine happens is my target audience it is
> definitely worth looking at what can be done in a context like sparse.
> That is the fastest way to connect the tools with the real world
> problems.  If it can't be done or if it can't be made nice simple
> and easy to use then there will be good arguments for why we need a new
> syntax.
>
> Mostly I asked the question because the type system would be so much
> easier in a green field setting, and I wanted to be lazy.  So hearing
> the common sense request (Please fix C.) Is helpful to get that lazy
> part of myself in gear.
>
> The end game for me is a type system that doesn't permit memory errors,
> reference counting errors, locking errors, or deadlocks, while at the
> same time allowing pretty much the code and data structures that we use
> in the kernel today.
>
> Eric
> _______________________________________________
> Ksummit-discuss mailing list
> Ksummit-discuss at lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/ksummit-discuss
>


More information about the Ksummit-discuss mailing list