[Ksummit-2013-discuss] NUMA locality for storage

Tue Jul 30 16:37:46 UTC 2013

On Tue, 30 Jul 2013, Chris Mason wrote:

> Another variation is an extension to the NUMA api that takes an fd and
> returns a node mask of local CPUs for reads and local CPUs for writes.

Guarantee that processes that wake up do so on the processor socket that
has the device attached? Similar to the network stack. That way processes
naturally migrate to the node with best performance.

> It would take a fair amount of plumbing from the fd down to the bdev but
> it lets the application make its own decisions about which nodes to bind
> to, and could allow us to query which CPU is best for a given network
> socket as well.

This gets a bit complicated given multi device devices and also various
kernel threads that are involved in processing the data. The whole shebang
needs to be orchestrated from the OS side to run on the right numa node
and the right processors. What user space can do is limited.

Tread the different processor sockets with separate PCI complexes as
different I/O subsystems? That would avoid serialization across two NUMA
domains.