[Ksummit-discuss] [TECH TOPIC] Fix devm_kzalloc, its users, or both

Russell King - ARM Linux linux at arm.linux.org.uk
Tue Aug 4 11:18:05 UTC 2015


On Tue, Aug 04, 2015 at 12:40:00PM +0200, Daniel Vetter wrote:
> I'd love to see generic support to revoke/block anything coming in
> through a file_ops.

Yes, I've been looking at one of my drivers which is definitely broken
in this regard, and that's the conclusion I'm coming to - it's not a
problem of lifetime control of the private driver data structures, but
it's a much bigger issue than that.

If a driver's file_operations methods touch hardware, then we need a
way to prevent those file_operations methods from touching the hardware
once the device driver's ->remove callback has been called.

Yes, we can stick a flag in the driver private data structure to
indicate that it's gone, but do we want to keep on reviewing this in
driver code, or should we make it easier for driver authors, subsystem
authors and core reviewers to get this right by providing a "revoke"
which can be called in the ->remove callback?

However, providing a revoke is not easy: if we look at the tty code,
which has to deal with tty hangups, then we find that it does this:

        spin_lock(&tty_files_lock);
        /* This breaks for file handles being sent over AF_UNIX sockets ? */
        list_for_each_entry(priv, &tty->tty_files, list) {
                filp = priv->file;
                if (filp->f_op->write == redirected_tty_write)
                        cons_filp = filp;
                if (filp->f_op->write != tty_write)
                        continue;
                closecount++;
                __tty_fasync(-1, filp, 0);      /* can't block */
                filp->f_op = &hung_up_tty_fops;
        }
        spin_unlock(&tty_files_lock);

IOW, it replaces the f_op on all files with hung_up_tty_fops.

Now, the filp is protected from going away by being on the list: a
f_op->release will block in tty_del_file() for the above lock to be
released.  That much is safe.

However, what if another thread/cpu is in one of the f_op methods when
filp->f_op is replaced with the "revoked" operations?  What I'm saying
is that there's no guarantee that one of the file's operations won't
be executing on another CPU at the point that we're trying to revoke it,
and we don't want any threads to be there, because they could access
data we're about to free.

That's the difficult part of the problem, and it's one which things like
stop_machine() doesn't solve (since we may have scheduled away from
a method to run the stop_machine() thread.)  We can't wait for the file
to close, because that introduces soft-deadlocks - consider a thread
which holds a reference on the filp, but is trying to unbind the driver
via sysfs... just like the sysfs issues we've had in the past.

Unfortunately, this is where things are much more complex than the module
case - the module has has the advantage that:

rmmod chardev-driver < /dev/that-chardev-driver-node

can return saying "module busy".  We don't have that luxury when it comes
to removing 'struct device' - the driver ->remove callback has no way to
fail (it's return code is always ignored.)

A solution to that would be to drop something like a read-write lock into
almost all f_op methods, which sounds expensive to me in the general case.

-- 
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.


More information about the Ksummit-discuss mailing list