[Ksummit-discuss] [TECH TOPIC] Addressing complex dependencies and semantics (v2)

Tue Aug 2 00:01:56 UTC 2016

On Monday, August 01, 2016 09:03:09 PM Luis R. Rodriguez wrote:
> On Fri, Jul 29, 2016 at 12:13:03PM +0100, Mark Brown wrote:
> > On Fri, Jul 29, 2016 at 09:45:55AM +0200, Hans Verkuil wrote:
> > 
> > > My main problem is not so much with deferred probe (esp. for cyclic dependencies
> > > it is a simple method of solving this, and simple is good). My main problem is
> > > that you can't tell the system that driver A needs to be probed after drivers B,
> > > C and D are probed first.
> > 
> > > That would allow us to get rid of v4l2-async.c which is a horrible hack.
> 
> I'd like to understand the requirement for this a bit better, so someone explaining
> this would be good if this moves forward as a tech session.

One case I'm familiar with is when a device has two (or more) device IDs, where
one is more specific than the other.  The idea being that if the OS has a driver
for that particular device, it will use the more specific device ID (say A) to
look for it, but otherwise it will use the other "generic" ID (say B) to match
against a "generic" driver.

Of course, that only works if the driver for A is probed before the driver for B.

> > > 
> > > That code allows a bridge driver to wait until all dependent drivers are probed.
> > > This really should be core functionality.
> > 
> > > Do other subsystems do something similar like drivers/media/v4l2-core/v4l2-async.c?
> > > Does anyone know?
> > 
> > ASoC does, it has an explicit card driver to join things together and
> > that just defers probe until everything it needs is present.  This was
> > originally open coded in ASoC but once deferred probe was implemented we
> > converted to that.
> 
> OK that's 2.
> 
> Another piece of code that deals with complex dependencies I've recently
> ventured into was the x86 IOMMU stuff. The complexities here are both at
> the built-in init level and also later at probe.
> 
> For the built-in code we have a code run time sort where based on a simple set
> of semantics used to declare dependencies final code that was compiled in is
> sorted out so that the code that needs to initialize first triggers first. hpa
> had suggested long ago that generalizing this was desirable, so I've taken that
> task and have some basic building blocks which are now being proposed in their
> RFC v3 series [0]. If it seems that the run time sort mechanism is left out,
> its correct, upon review with Andy on RFC v2 we don't *yet* need a run time
> sort, even though I expanded on the existing one for IOMMU and strengthened the
> semantics there, its still available on the userspace mockup solution for
> linker tables though [1]. I should note that I did at least determine that
> generalizing a sort for dependency maps which shuffles code at run time
> did not make sense due to the fact that you'd either need to generalize the
> building blocks used for defining a dependency map. If you move one item used
> for init from the end to the front, your iteration function must use the same
> structure in the same way. Run time sorts for code sections then are left up
> to each subsystem to implement. On x86, I evaluated sorting out dependencies
> further, for instance on setup_arch() -- however it just seemed not needed
> at this point. For other subsystems this may make more sense.
> 
> At the probe level IOMMUs face another issue when dealing with GPU drivers.
> The kernel has 7 levels of initialization possible (pure_initcall() == 0,
> core_initcall() == 1, ..., fs_initcall() == 5, device_initcall() == 6,
> late_initcall() == 7), module_init() maps to device_initcall(), but modules may
> also use late_initcall() if they know a driver built-in needs to definitely run
> very late. Naturally if you're not a piece of code running very early on, and
> you want the option to be a module as well (device_initcall == 6) then you
> really only have at your disposal two levels available before you get to an end
> device driver. It turns out that for GPU drivers this is not enough to map out
> enough dependencies at the higher level, so the only next best solution
> available is to implicitly rely on linker order. This is the case for ordering
> logistics between AMD IOMMU v1, AMD IOMMUv2 (depends on AMD IOMMU v1), and
> AMD KFD driver, and the AMD radeon driver:
> 
> 0. AMD IOMMUv1:    arch/x86/kernel/pci-dma.c
> 1. AMD IOMMUv2:    drivers/iommu/amd_iommu_v2.c
> 2. AMD KFD:        drivers/gpu/drm/amd/amdkfd/kfd_module.c
> 3. AMD Radeon:     drivers/gpu/drm/radeon/radeon_drv.c
> 
> For details refer to a recent discussion on this [2].
> 
> Using linker tables for initialization should enable us to scale initialization
> levels well beyond 7 levels, in fact this could be however long an ASCII string
> is acceptable in C as the order level is stored as a string (part of the section
> name) and also enable each subsystem to have its own set of levels / heuristics
> for initialization as well. Ordering would actually be done at link time, so
> the code is already ordered *iff* all dependency heuristics are available at
> compilation time. If dependency information can only be available at run time,
> a run time sort is optional but using the existing x86 IOMMU code as an example
> and precedent (and considering my expanded stricter semantics) -- in the future
> this can be an option for any subsystem as well as a supplement.
> 
> My interest in the media controller and the SAT solver is I consider the run
> time sort nothing but a sloppy hack (see my userspace tree for tons of
> semantics fixes), and formalizing this to something more concrete that can be
> generic should be more useful to other parts of the kernel. This is why I
> wanted to discuss these things. Is there a generic common goal here and
> can we share code ? If so what would that look like ?
> 
> My impression at first is that linker-time dependencies should suffice to cover
> a lot cases, but that should still mean then that binutils ld SORT() could
> potentially be enhanced further in the future, specially if a simple dependency
> map could be expressed. For instance can we make it run faster, or would
> having it interpret certain other criteria enable us to use a SAT solver
> to optimize dependency resolution / ordering. Then there is run time sorting
> and optimization -- but the same questions apply here.
> 
> I think a fruitful session would be to:
> 
>   o review / explain each of the complex ordering problems from different
>     kernel subsystems
>   o review / explain existing solutions to these problems
>   o finally try to figure out if a common solution is possible, both
>     short term and long term
> 
> [0] https://lkml.kernel.org/r/1469222687-1600-1-git-send-email-mcgrof@kernel.org
> [1] https://git.kernel.org/cgit/linux/kernel/git/mcgrof/linker-tables.git/
> [2] https://lkml.kernel.org/r/1464311916-10065-1-git-send-email-mcgrof@kernel.org

I'm still wondering why probe ordering is special and suspend-resume and runtime
PM ordering isn't (and shutdown ordering too, for that matter).

In the majority of cases when probe ordering matters, suspend/resume and runtime
PM ordering matters too, so considering one of them alone doesn't seem very
useful to me.

Now, even if you can address probe ordering issues by using some link-time
methods and similar, runtime PM is async by nature and suspend/resume is
async for many devices too (for efficiency reasons) and the only approach
that works here is to have the dependencies represented as data somehow
and use those when you need to carry out the operation.

Thanks,
Rafael