[Desktop_architects] Re: Linux desktop, the fun thread

Thu Dec 15 20:07:41 PST 2005

Linus,

I think there may be other things going on here.

X behavior is, at the moment, in its current incarnation, completely,
totally braindead.  It doesn't have to be this way.

X is busywaiting on registers in *user* space; so the kernel thinks it
is a hoggish process, so its priority gets dumped, and it won't
therefore be scheduled when it should be.

Even the *first* X implementation on UNIX had enough of a kernel udriver
so that rather than busywaiting in user space more than a few tries,
we'd call the kernel with enough information to start the next commands
on their way when a completion interrupt arrived.

Then the X server looks like the interactive, well behaved, non-piggish
process it should naturally be.

Due to XFree86 brain damaged ideas about how to do "OS independent X
drivers", this incredibly insane situation has endured.  So we need to
get these hooks in Linux.

What's needed is a bit of help in a kernel driver, so when the graphics
hardware is busy, we can be kind enough to give you the CPU back (along
with what to do when the hardware is next free so we can keep the
hardware busy rather than waiting for the next time the X server can get
scheduled, like on all the "old fashioned" X implementations on UNIX.

				- Jim

On Tue, 2005-12-13 at 22:38 -0800, Linus Torvalds wrote:
> 
> On Tue, 13 Dec 2005, Nat Friedman wrote:
> >         
> > I'll happily wear the "feature slut" t-shirts with pride now.  And
> > Linus, once my mouse button stops jerking when I copy big files, I'll
> > add a GUI for you.  ;-)
> 
> Ok, I can tell you why I think it jerks around, you can make suggestions.. 
> Sadly, I suspect most of them have nothing at all to do with the kernel.
> 
> You may remember that a few years ago the X mouse cursor was really jerky 
> under _any_ kind of X load. If the X server was busy doing something, it 
> would simply not update the mouse pointer, so if you had anything that 
> caused the X server to do big slow things (unaccelerated scrolling, 
> ellipses, big repaints, whatever), you'd have a mouse that would only 
> update occasionally.
> 
> The solution to that was for the X server to update the cursor position in 
> the signal handler (SIGIO on the mouse device), and it suddenly became a 
> _lot_ smoother. Smooth enough that now you have to do something slightly 
> special to get back the good (bad) old jerky mouse cursor.
> 
> Now, I think you may already see where this is going.
> 
> Think of a _totally_ idle X server on a system that is busy copying files. 
> What does it do when it gets that mouse report?
> 
> The first thing it does is to get the SIGIO, and in the signal handler it 
> will update the mouse location itself. That's fine: it's all in RAM, 
> there's nothing at all to worry about, and it does that immediately. The 
> mouse location update is usually just a couple of IO accesses. It then 
> returns from the signal handler, and will return to the main select loop.
> 
> So far so good.
> 
> Now it's not idle any more. The "select()" will have exited, but X now has 
> an agenda: it needs to inform all the clients that the mouse has moved. 
> That basically ends up looping over the internal X client list, and for 
> each client that shows interest (which tends to be most of them ;), X 
> does:
> 
>  - a couple of malloc() calls to create the packet to be written for that 
>    client
>  - a writev() call that actually sends the packet out to the client.
> 
> it then goes back to the main loop (if the X server is busy with other 
> things, this all ends up being more complex, but it doesn't change the 
> basic premise - it just makes it even easier to hit the behaviour I'm 
> about to explain):
> 
> Now, the kernel should actually schedule all of this perfectly well. 
> That's not the problem. The problem is subtler than that. The problem is 
> the memory allocations inherent in both the mallocs and internally in the 
> kernel in the writev() itself. They will occasionally block, because the 
> big file copy has dirtied a lot of buffers that we need to write out to 
> make room for more.
> 
> They don't block for a long time, but when it happens, what do you think 
> goes on? A mouse event comes in, and the kernel immediately sends a SIGIO. 
> But the process it sent the SIGIO to (X) is _blocked_ on the memory 
> allocations it does. So even though the kernel sent the signal as soon as 
> the mouse moved, it won't be _acted_ upon, because X is busy doing 
> something else.
> 
> See? It's _exactly_ the same situation as before, except using signals 
> means that the "busy doing something else" is now no longer any of the 
> normal user space loops, it's system calls that can't be interrupted (the 
> writev() blocking on interruptible IO would be interrupted, but memory 
> allocations aren't interruptible).
> 
> So the mouse is still jumpy for all the same reasons, but now you need 
> quite a bit more than just drawing activity to see it.
> 
> And notice that the mallocs had _nothing_ to do with actually moving the 
> mouse pointer. We could have done the mouse pointer move with no trouble 
> at all. We were blocked on them for other reasons (sending the events to 
> the clients i sobviously a _result_ of the mouse moving, but it's totally 
> independent of actually updating the screen with a new mouse pointer). So 
> we were single-threaded for no good reason, except that X itself is pretty 
> much single-threaded.
> 
> Quite frankly, the simplest solution by far would be that X used a real 
> thread for mouse handling (and possibly other things, but mouse screen 
> updates really do tend to be special. I don't think X uses signals for 
> anything else than mouse updates and timers, for example).
> 
> That's really what it is all about. It uses SIGIO to approximate having a 
> real thread, but we all know that there's "real threading" and then 
> there's "fake threads".
> 
> And SIGIO is very much a fake thread, and because it's fake, it ends up 
> having these silly cases where the "signal thread" isn't executable 
> because the "main thread" is doing something else.
> 
> See? I'd love to help, but the problem really isn't that the kernel cannot 
> do it, it's that X really doesn't handle the mouse events truly 
> asynchronously. The kernel would certainly happily _allow_ X doing so, 
> but..
> 
> Now, I don't blame the X guys either - threaded programming really is 
> pretty _nasty_. But I really do believe that in this case it would be the 
> right thing, and would solve the problem you see.
> 
> Btw, I have no proof that this is what is going on. There may be other 
> interactions, and I haven't worked with the X server enough to know all it 
> does internally. The only thing I know about X I learnt having to deal 
> with what it does from a kernel standpoint ;)
> 
> And I warned you - often the kernel people end up blaming others. If you 
> know any X developers (KeithP? Are you listening?) I'd certainly love to 
> try to help them out some way here, but I _suspect_ the kernel actually 
> already does implement everything that X would need to have a smooth 
> mouse. It's just that the smooth mouse requires more care than X currently 
> gives it.
> 
> And hey, I may be full of cr*p, and the bad mouse behaviour you see could 
> be due to bad scheduling in the kernel. I really do suspect the above 
> schenario, though.
> 
> Now, I know there's been at least experimental multi-threaded X servers 
> around for something like two decades by now. I do not believe that it has 
> ever been production quality, though (I think it was at some point planned 
> to be a standard feature of X11r6, that certainly did _not_ happen).
> 
> Keith may be able to tell us more.
> 
> Btw, I don't think mouse handling in any way implies that the X server in 
> general would need to be multi-threaded, so there's no need to resurrect 
> the (very complex) threading stuff. I suspect that you could do mouse 
> handling most simply by continuing for the "real core" of the X server 
> being single-threaded, and just having a very specialized thread that does 
> _nothing_ but mouse events.
> 
> In fact, you could try to add this to the X server startup code:
> 
> 	/* Do nothing but take signals */
> 	static void dummy_thread(void *arg)
> 	{
> 		for (;;)
> 			pause();
> 	}
> 
> 	...
> 	pthread_create(.. dummy_thread, NULL);
> 	...
> 
> and it might even just work. The only thing it does is to take the signals 
> (_possibly_) in another thread.
> 
> Of course, it's actually much more likely that it just makes X very very 
> flaky. For example, if X uses siglongjmp() or something like that in a 
> signal handler (which it may well do), taking a signal in the "idle 
> thread" and then longjumping into smewhere else would do some seriously 
> horribly bad things.
> 
> So I just throw out the above suggestion as a total hack that likely 
> doesn't do what you'd want it to do, but hopefully explains one way to 
> move towards it.
> 
> 		Linus
> _______________________________________________
> Desktop_architects mailing list
> Desktop_architects at lists.osdl.org
> https://lists.osdl.org/mailman/listinfo/desktop_architects