[Desktop_architects] Re: Linux desktop, the fun thread
Linus Torvalds
torvalds at osdl.org
Tue Dec 13 22:38:53 PST 2005
On Tue, 13 Dec 2005, Nat Friedman wrote:
>
> I'll happily wear the "feature slut" t-shirts with pride now. And
> Linus, once my mouse button stops jerking when I copy big files, I'll
> add a GUI for you. ;-)
Ok, I can tell you why I think it jerks around, you can make suggestions..
Sadly, I suspect most of them have nothing at all to do with the kernel.
You may remember that a few years ago the X mouse cursor was really jerky
under _any_ kind of X load. If the X server was busy doing something, it
would simply not update the mouse pointer, so if you had anything that
caused the X server to do big slow things (unaccelerated scrolling,
ellipses, big repaints, whatever), you'd have a mouse that would only
update occasionally.
The solution to that was for the X server to update the cursor position in
the signal handler (SIGIO on the mouse device), and it suddenly became a
_lot_ smoother. Smooth enough that now you have to do something slightly
special to get back the good (bad) old jerky mouse cursor.
Now, I think you may already see where this is going.
Think of a _totally_ idle X server on a system that is busy copying files.
What does it do when it gets that mouse report?
The first thing it does is to get the SIGIO, and in the signal handler it
will update the mouse location itself. That's fine: it's all in RAM,
there's nothing at all to worry about, and it does that immediately. The
mouse location update is usually just a couple of IO accesses. It then
returns from the signal handler, and will return to the main select loop.
So far so good.
Now it's not idle any more. The "select()" will have exited, but X now has
an agenda: it needs to inform all the clients that the mouse has moved.
That basically ends up looping over the internal X client list, and for
each client that shows interest (which tends to be most of them ;), X
does:
- a couple of malloc() calls to create the packet to be written for that
client
- a writev() call that actually sends the packet out to the client.
it then goes back to the main loop (if the X server is busy with other
things, this all ends up being more complex, but it doesn't change the
basic premise - it just makes it even easier to hit the behaviour I'm
about to explain):
Now, the kernel should actually schedule all of this perfectly well.
That's not the problem. The problem is subtler than that. The problem is
the memory allocations inherent in both the mallocs and internally in the
kernel in the writev() itself. They will occasionally block, because the
big file copy has dirtied a lot of buffers that we need to write out to
make room for more.
They don't block for a long time, but when it happens, what do you think
goes on? A mouse event comes in, and the kernel immediately sends a SIGIO.
But the process it sent the SIGIO to (X) is _blocked_ on the memory
allocations it does. So even though the kernel sent the signal as soon as
the mouse moved, it won't be _acted_ upon, because X is busy doing
something else.
See? It's _exactly_ the same situation as before, except using signals
means that the "busy doing something else" is now no longer any of the
normal user space loops, it's system calls that can't be interrupted (the
writev() blocking on interruptible IO would be interrupted, but memory
allocations aren't interruptible).
So the mouse is still jumpy for all the same reasons, but now you need
quite a bit more than just drawing activity to see it.
And notice that the mallocs had _nothing_ to do with actually moving the
mouse pointer. We could have done the mouse pointer move with no trouble
at all. We were blocked on them for other reasons (sending the events to
the clients i sobviously a _result_ of the mouse moving, but it's totally
independent of actually updating the screen with a new mouse pointer). So
we were single-threaded for no good reason, except that X itself is pretty
much single-threaded.
Quite frankly, the simplest solution by far would be that X used a real
thread for mouse handling (and possibly other things, but mouse screen
updates really do tend to be special. I don't think X uses signals for
anything else than mouse updates and timers, for example).
That's really what it is all about. It uses SIGIO to approximate having a
real thread, but we all know that there's "real threading" and then
there's "fake threads".
And SIGIO is very much a fake thread, and because it's fake, it ends up
having these silly cases where the "signal thread" isn't executable
because the "main thread" is doing something else.
See? I'd love to help, but the problem really isn't that the kernel cannot
do it, it's that X really doesn't handle the mouse events truly
asynchronously. The kernel would certainly happily _allow_ X doing so,
but..
Now, I don't blame the X guys either - threaded programming really is
pretty _nasty_. But I really do believe that in this case it would be the
right thing, and would solve the problem you see.
Btw, I have no proof that this is what is going on. There may be other
interactions, and I haven't worked with the X server enough to know all it
does internally. The only thing I know about X I learnt having to deal
with what it does from a kernel standpoint ;)
And I warned you - often the kernel people end up blaming others. If you
know any X developers (KeithP? Are you listening?) I'd certainly love to
try to help them out some way here, but I _suspect_ the kernel actually
already does implement everything that X would need to have a smooth
mouse. It's just that the smooth mouse requires more care than X currently
gives it.
And hey, I may be full of cr*p, and the bad mouse behaviour you see could
be due to bad scheduling in the kernel. I really do suspect the above
schenario, though.
Now, I know there's been at least experimental multi-threaded X servers
around for something like two decades by now. I do not believe that it has
ever been production quality, though (I think it was at some point planned
to be a standard feature of X11r6, that certainly did _not_ happen).
Keith may be able to tell us more.
Btw, I don't think mouse handling in any way implies that the X server in
general would need to be multi-threaded, so there's no need to resurrect
the (very complex) threading stuff. I suspect that you could do mouse
handling most simply by continuing for the "real core" of the X server
being single-threaded, and just having a very specialized thread that does
_nothing_ but mouse events.
In fact, you could try to add this to the X server startup code:
/* Do nothing but take signals */
static void dummy_thread(void *arg)
{
for (;;)
pause();
}
...
pthread_create(.. dummy_thread, NULL);
...
and it might even just work. The only thing it does is to take the signals
(_possibly_) in another thread.
Of course, it's actually much more likely that it just makes X very very
flaky. For example, if X uses siglongjmp() or something like that in a
signal handler (which it may well do), taking a signal in the "idle
thread" and then longjumping into smewhere else would do some seriously
horribly bad things.
So I just throw out the above suggestion as a total hack that likely
doesn't do what you'd want it to do, but hopefully explains one way to
move towards it.
Linus
More information about the Desktop_architects
mailing list