[cgl_discussion] Re: [Cgl_proof_subgroup] FYI: timer scalability whitepaper url

Thu Sep 5 16:38:25 PDT 2002

On Thu, 2002-09-05 at 15:42, george anzinger wrote:
> Whoo, wait a nano second.  Andy's paper analized what I call
> the "cascade timers".  High-res-timers require a new timer
> list structure which I described somewhere (see below).

Yes, I saw the nice graphs posted earlier.

  At
> this time you can still "CONFIG" the cascade timer list, but
> I consider that a warm fuzzy that will go away once we feel
> "comfortable" with the new structure.  The high-res option
> REQUIRES the new list structure so to be able to enable the
> "cascade timer" list you must turn off the high-res option.

Well, for what it's worth, the head of the CGLE repository won't build
with high-res configured out (yes, there is a bug filed).

>From a scalability standpoint, I'm not too concerned with fundamental
changes to the data structure, provided it doesn't make things worse.

Currently, the algorithm for insertion, modification, and removal is of
O(1) complexity.  SMP lock contention appears to be correlated to both
the length of a "cascade timer" list as well as the average wall-clock
execution time of invoked callbacks (1 long-running callback could cause
as much contention as 20,000 simultaneously firing very short
callbacks).  The cost for maintaining the "sorted" property of the data
structure is spread out over time (roughly 1 workload-related burst of
processing every N clock interrupts, and so on).  If the distribution of
expiration times is evenly distributed into the future, the "cascade
timer" algorithm really isn't all that bad.

On the other hand, if the system has been asked (directly or indirectly)
to maintain, say, 30,000 timers with identical expiration times set for
10 weeks into the future, then 30,000 timers need to be carried forward
until they fire.  In this scenario, the current "cascade timer"
algorithm as configured will cause no more than 4 (maybe fewer) cascade
operations over a span of 10 weeks.

I understand that from a real-time point of view, that "blurp" while the
system relinks 30,000 data structures might be a Very Bad Thing.  I
couldn't find a good description of a real-life carrier-grade workload
and what awful things it might do to the timer algorithm, so I'll just
toss up my hands at this point and say I don't have much of evidence of
it being a Very Bad Thing. ;^)

Andy