[Desktop_architects] Re: Linux desktop, the fun thread

Tue Dec 13 22:38:53 PST 2005

On Tue, 13 Dec 2005, Nat Friedman wrote:
>         
> I'll happily wear the "feature slut" t-shirts with pride now.  And
> Linus, once my mouse button stops jerking when I copy big files, I'll
> add a GUI for you.  ;-)

Ok, I can tell you why I think it jerks around, you can make suggestions.. 
Sadly, I suspect most of them have nothing at all to do with the kernel.

You may remember that a few years ago the X mouse cursor was really jerky 
under _any_ kind of X load. If the X server was busy doing something, it 
would simply not update the mouse pointer, so if you had anything that 
caused the X server to do big slow things (unaccelerated scrolling, 
ellipses, big repaints, whatever), you'd have a mouse that would only 
update occasionally.

The solution to that was for the X server to update the cursor position in 
the signal handler (SIGIO on the mouse device), and it suddenly became a 
_lot_ smoother. Smooth enough that now you have to do something slightly 
special to get back the good (bad) old jerky mouse cursor.

Now, I think you may already see where this is going.

Think of a _totally_ idle X server on a system that is busy copying files. 
What does it do when it gets that mouse report?

The first thing it does is to get the SIGIO, and in the signal handler it 
will update the mouse location itself. That's fine: it's all in RAM, 
there's nothing at all to worry about, and it does that immediately. The 
mouse location update is usually just a couple of IO accesses. It then 
returns from the signal handler, and will return to the main select loop.

So far so good.

Now it's not idle any more. The "select()" will have exited, but X now has 
an agenda: it needs to inform all the clients that the mouse has moved. 
That basically ends up looping over the internal X client list, and for 
each client that shows interest (which tends to be most of them ;), X 
does:

 - a couple of malloc() calls to create the packet to be written for that 
   client
 - a writev() call that actually sends the packet out to the client.

it then goes back to the main loop (if the X server is busy with other 
things, this all ends up being more complex, but it doesn't change the 
basic premise - it just makes it even easier to hit the behaviour I'm 
about to explain):

Now, the kernel should actually schedule all of this perfectly well. 
That's not the problem. The problem is subtler than that. The problem is 
the memory allocations inherent in both the mallocs and internally in the 
kernel in the writev() itself. They will occasionally block, because the 
big file copy has dirtied a lot of buffers that we need to write out to 
make room for more.

They don't block for a long time, but when it happens, what do you think 
goes on? A mouse event comes in, and the kernel immediately sends a SIGIO. 
But the process it sent the SIGIO to (X) is _blocked_ on the memory 
allocations it does. So even though the kernel sent the signal as soon as 
the mouse moved, it won't be _acted_ upon, because X is busy doing 
something else.

See? It's _exactly_ the same situation as before, except using signals 
means that the "busy doing something else" is now no longer any of the 
normal user space loops, it's system calls that can't be interrupted (the 
writev() blocking on interruptible IO would be interrupted, but memory 
allocations aren't interruptible).

So the mouse is still jumpy for all the same reasons, but now you need 
quite a bit more than just drawing activity to see it.

And notice that the mallocs had _nothing_ to do with actually moving the 
mouse pointer. We could have done the mouse pointer move with no trouble 
at all. We were blocked on them for other reasons (sending the events to 
the clients i sobviously a _result_ of the mouse moving, but it's totally 
independent of actually updating the screen with a new mouse pointer). So 
we were single-threaded for no good reason, except that X itself is pretty 
much single-threaded.

Quite frankly, the simplest solution by far would be that X used a real 
thread for mouse handling (and possibly other things, but mouse screen 
updates really do tend to be special. I don't think X uses signals for 
anything else than mouse updates and timers, for example).

That's really what it is all about. It uses SIGIO to approximate having a 
real thread, but we all know that there's "real threading" and then 
there's "fake threads".

And SIGIO is very much a fake thread, and because it's fake, it ends up 
having these silly cases where the "signal thread" isn't executable 
because the "main thread" is doing something else.

See? I'd love to help, but the problem really isn't that the kernel cannot 
do it, it's that X really doesn't handle the mouse events truly 
asynchronously. The kernel would certainly happily _allow_ X doing so, 
but..

Now, I don't blame the X guys either - threaded programming really is 
pretty _nasty_. But I really do believe that in this case it would be the 
right thing, and would solve the problem you see.

Btw, I have no proof that this is what is going on. There may be other 
interactions, and I haven't worked with the X server enough to know all it 
does internally. The only thing I know about X I learnt having to deal 
with what it does from a kernel standpoint ;)

And I warned you - often the kernel people end up blaming others. If you 
know any X developers (KeithP? Are you listening?) I'd certainly love to 
try to help them out some way here, but I _suspect_ the kernel actually 
already does implement everything that X would need to have a smooth 
mouse. It's just that the smooth mouse requires more care than X currently 
gives it.

And hey, I may be full of cr*p, and the bad mouse behaviour you see could 
be due to bad scheduling in the kernel. I really do suspect the above 
schenario, though.

Now, I know there's been at least experimental multi-threaded X servers 
around for something like two decades by now. I do not believe that it has 
ever been production quality, though (I think it was at some point planned 
to be a standard feature of X11r6, that certainly did _not_ happen).

Keith may be able to tell us more.

Btw, I don't think mouse handling in any way implies that the X server in 
general would need to be multi-threaded, so there's no need to resurrect 
the (very complex) threading stuff. I suspect that you could do mouse 
handling most simply by continuing for the "real core" of the X server 
being single-threaded, and just having a very specialized thread that does 
_nothing_ but mouse events.

In fact, you could try to add this to the X server startup code:

	/* Do nothing but take signals */
	static void dummy_thread(void *arg)
	{
		for (;;)
			pause();
	}

	...
	pthread_create(.. dummy_thread, NULL);
	...

and it might even just work. The only thing it does is to take the signals 
(_possibly_) in another thread.

Of course, it's actually much more likely that it just makes X very very 
flaky. For example, if X uses siglongjmp() or something like that in a 
signal handler (which it may well do), taking a signal in the "idle 
thread" and then longjumping into smewhere else would do some seriously 
horribly bad things.

So I just throw out the above suggestion as a total hack that likely 
doesn't do what you'd want it to do, but hopefully explains one way to 
move towards it.

		Linus