[cgl_discussion] Re: [Cgl_proof_subgroup] FYI: timer scalability whitepaper url

george anzinger george at mvista.com
Thu Sep 5 18:27:38 PDT 2002

Andy Pfiffer wrote:
> On Thu, 2002-09-05 at 15:42, george anzinger wrote:
> > Whoo, wait a nano second.  Andy's paper analized what I call
> > the "cascade timers".  High-res-timers require a new timer
> > list structure which I described somewhere (see below).
> Yes, I saw the nice graphs posted earlier.
>   At
> > this time you can still "CONFIG" the cascade timer list, but
> > I consider that a warm fuzzy that will go away once we feel
> > "comfortable" with the new structure.  The high-res option
> > REQUIRES the new list structure so to be able to enable the
> > "cascade timer" list you must turn off the high-res option.
> Well, for what it's worth, the head of the CGLE repository won't build
> with high-res configured out (yes, there is a bug filed).

Uh, have a number?  Do we now have a new Bugzilla?  Awk!!
> >From a scalability standpoint, I'm not too concerned with fundamental
> changes to the data structure, provided it doesn't make things worse.
> Currently, the algorithm for insertion, modification, and removal is of
> O(1) complexity.  SMP lock contention appears to be correlated to both
> the length of a "cascade timer" list as well as the average wall-clock
> execution time of invoked callbacks (1 long-running callback could cause
> as much contention as 20,000 simultaneously firing very short
> callbacks). 

Actually, I have been told, that if you take the cascade in
to account it is not really O(1).  There was some discussion
on this on the high-res-timers discussion list. 

Yes, it is these callbacks that give Real Time OS venders
nightmares.  There is no real control...

> The cost for maintaining the "sorted" property of the data
> structure is spread out over time (roughly 1 workload-related burst of
> processing every N clock interrupts, and so on).  If the distribution of
> expiration times is evenly distributed into the future, the "cascade
> timer" algorithm really isn't all that bad.
> On the other hand, if the system has been asked (directly or indirectly)
> to maintain, say, 30,000 timers with identical expiration times set for
> 10 weeks into the future, then 30,000 timers need to be carried forward
> until they fire.  In this scenario, the current "cascade timer"
> algorithm as configured will cause no more than 4 (maybe fewer) cascade
> operations over a span of 10 weeks.

I think the real issue with such a load would be the expire
time when all the call backs need to be made.  But this is a
problem not matter what list structure is used.
> I understand that from a real-time point of view, that "blurp" while the
> system relinks 30,000 data structures might be a Very Bad Thing.  I
> couldn't find a good description of a real-life carrier-grade workload
> and what awful things it might do to the timer algorithm, so I'll just
> toss up my hands at this point and say I don't have much of evidence of
> it being a Very Bad Thing. ;^)

For what its worth, I did not move away from the cascade
list just because it is cascade.  It was more related to
needing to keep an ordered list in any case (high-res has
entries between jiffies).  It is also easier to throw more
memory at the new list than at the cascade list.
> Andy

George Anzinger   george at mvista.com
Preemption patch:

More information about the cgl_discussion mailing list