[cgl_discussion] Re: [cgl_tech_board] Re: [cgl_specs] Use case - Live patching

Timothy D. Witham wookie at osdl.org
Mon Mar 28 10:14:33 PST 2005


On Mon, 2005-03-28 at 10:38 -0600, Corey Minyard wrote:
> Timothy D. Witham wrote:
> 
> >On Mon, 2005-03-28 at 09:05 -0600, Corey Minyard wrote:
> >  
> >
> >>Why do you think it is evil?  It is standard practice in most large 
> >>telecom systems as it improves availability.
> >>
> >>    
> >>
> >    Maybe it would be better to word it as "it can improve
> >availability". 
> >  
> >
> Point taken, but that is true for practically any technique to improve 
> availability.
> 
> >But it really is a hold over from the single large expensive CPU design
> >days.
> >  
> >
> Not exactly.  It is useful for any system where you don't want to have 
> to bring it down (or partially down) to fix a bug or you want to be able 
> to back out the changes later.  It helps in the following ways:
> 
   Right you are. 

>    1. Reduces time to fix, as applying a patch is generally a lot faster
>       than upgrading a system.
>    2. Avoids having to go simplex in a 1+1 system.
>    3. Allows fixes with undesirable side effects to be easily removed. 
>       The patch systems I have used allows patches to be removed, and
>       fixes with  undesirable side effects happened enough to make that
>       very useful.
>    4. Allows debugging changes to be installed and removed in the
>       field.  I have seen patches used to debug problems in the field.
>    5. Avoids the "big bang" effect of installing an update for a fix and
>       getting a whole bunch of other changes that may have undesirable
>       side effects.

    All good things but without the tools to ensure that the live patch
and the disk image are the same all of the above benefit can be
lost that first time it has to go to storage for the image. (I'm using
disk for disk anything that functions as the boot image.)

     This includes the same source for the build that produces the
live patch and the updated boot image.  

     I guess I feel that live patching is like a loaded double barrel
shotgun with no trigger guard or safety.  Yea, there are things
that you can do quickly with it but one of them involves your
foot.   

    I guess I would be happy with something that made sure that
the live patch and the boot patch were the same set of code.

Tim

> 
> >    If you don't exercise absolute top down control you get to a
> >situation
> >were there isn't a correlation between what is on the disk for a reboot
> >and what is in memory being executed.   While a phone company
> >might be able to control their switch with rather infrequent updates
> >in the general usage this can cause real issues. 
> >
> > In fact from my support of phone company days I remember a 
> >couple of issues where switches where bounced because of a 
> >major environmental issue and when they came back up they
> >were missing features and patches.  They were in such a sorry
> >state that they had to be reloaded in order to function correctly.   
> >  
> >
> Yes, this is probably the biggest problem with runtime patching.  In the 
> systems I used, we generally took images of the system with the patches 
> or we kept applied patches in a directory and applied them all when the 
> system restarted.
> 
> The other big problem is that a very good patching system tends to get 
> "abused" and used to install major updates, major features, etc.  Things 
> it is really not intended for.
> 
> I don't think this is intended for general systems.  I wouldn't want to 
> see it on workstations.  But it is useful in very controlled environments.
> 
> -Corey
> 
> > 
> >    Tim
> >
> >  
> >
> >>-Corey
> >>
> >>Ralf Flaxa wrote:
> >>
> >>    
> >>
> >>>Speaking for SUSE/Novell I can at least say that live patching is evil
> >>>and would never be considered supported. How shall you guarantee support
> >>>or certification with such a mechanism in place?
> >>>
> >>>	Ralf
> >>>
> >>>On Mon, Mar 28, 2005 at 11:14:50AM +0900, Takashi Ikebe wrote:
> >>> 
> >>>
> >>>      
> >>>
> >>>>The following is a use case for a Live patching.  This
> >>>>addresses AVL10.0 Live patching on CGL Specification 3.0.
> >>>>Please feel free to comment / suggestion.
> >>>>
> >>>>Takashi.
> >>>>-----------------------------------------------------------------------------------------------
> >>>>Description
> >>>>OSDL CGL specifies that carrier grade Linux shall provide the mechanism
> >>>>for dynamically replacing the symbols of a running process without
> >>>>restarting. Dynamic replacement of symbols allows a process to access
> >>>>patched functions or values without restarting and can improve process
> >>>>availability.
> >>>>
> >>>>Desired Outcome
> >>>>Mainline kernel acceptance or distro acceptance
> >>>>
> >>>>Participants/Roles
> >>>>System administrators setup the requirements on installations. On
> >>>>operation, system administrators apply patch with the requirement.
> >>>>
> >>>>Scenarios
> >>>>On operation, system administrators apply patch with the requirement as
> >>>>following scenario;
> >>>>1.System administrators make patch file from diff file or new version's
> >>>>source code.
> >>>>2.System administrators load patch to the process with provided live
> >>>>patch tool.
> >>>>3.System administrators activate patch to the process with provided live
> >>>>patch tool.
> >>>>4.Confirm that the patch is correctly applied or not.
> >>>>
> >>>>Implementation Notes
> >>>>The requirement need to have following functions;
> >>>>- The function which loads the patch file to target process's memory area.
> >>>>- The function which overwrites the branch operation code to the patch,
> >>>>on the entry point of  target process's functions which wants to fix by
> >>>>patch.
> >>>>- The function which restores overwritten branch code.
> >>>>- The function which unload the patch files.
> >>>>Through above functions, the requirement realize on-line patch to target
> >>>>process.
> >>>>The requirement need to provide on-line patch even if the process is
> >>>>multi-thread model process, or environment is SMP, and stop time of
> >>>>target process should not over  100 milliseconds.
> >>>>
> >>>>References
> >>>>Pannus project: http://pannus.sourceforge.net/
> >>>>Live patching implementation:
> >>>>http://prdownloads.sourceforge.net/pannus/pannus_en.pdf
> >>>>
> >>>>
> >>>>-- 
> >>>>Takashi Ikebe
> >>>>NTT Network Service Systems Laboratories
> >>>>9-11, Midori-Cho 3-Chome Musashino-Shi,
> >>>>Tokyo 180-8585 Japan
> >>>>Tel : +81 422 59 4246, Fax : +81 422 60 4012
> >>>>e-mail : ikebe.takashi at lab.ntt.co.jp
> >>>>   
> >>>>
> >>>>        
> >>>>
-- 
Timothy D. Witham - Chief Technology Officer - wookie at osdl.org
Open Source Development Lab Inc - A non-profit corporation
12725 SW Millikan Way - Suite 400 - Beaverton OR, 97005
(503)-906-1911  (office)    (503)-702-2871   (cell)
(503)-626-2436  (fax)




More information about the cgl_discussion mailing list