[cgl_discussion] Re: [cgl_specs] Use case - Live patching

Mon Mar 28 08:38:33 PST 2005

Timothy D. Witham wrote:

>On Mon, 2005-03-28 at 09:05 -0600, Corey Minyard wrote:
>  
>
>>Why do you think it is evil?  It is standard practice in most large 
>>telecom systems as it improves availability.
>>
>>    
>>
>    Maybe it would be better to word it as "it can improve
>availability". 
>  
>
Point taken, but that is true for practically any technique to improve 
availability.

>But it really is a hold over from the single large expensive CPU design
>days.
>  
>
Not exactly.  It is useful for any system where you don't want to have 
to bring it down (or partially down) to fix a bug or you want to be able 
to back out the changes later.  It helps in the following ways:

   1. Reduces time to fix, as applying a patch is generally a lot faster
      than upgrading a system.
   2. Avoids having to go simplex in a 1+1 system.
   3. Allows fixes with undesirable side effects to be easily removed. 
      The patch systems I have used allows patches to be removed, and
      fixes with  undesirable side effects happened enough to make that
      very useful.
   4. Allows debugging changes to be installed and removed in the
      field.  I have seen patches used to debug problems in the field.
   5. Avoids the "big bang" effect of installing an update for a fix and
      getting a whole bunch of other changes that may have undesirable
      side effects.

>    If you don't exercise absolute top down control you get to a
>situation
>were there isn't a correlation between what is on the disk for a reboot
>and what is in memory being executed.   While a phone company
>might be able to control their switch with rather infrequent updates
>in the general usage this can cause real issues. 
>
> In fact from my support of phone company days I remember a 
>couple of issues where switches where bounced because of a 
>major environmental issue and when they came back up they
>were missing features and patches.  They were in such a sorry
>state that they had to be reloaded in order to function correctly.   
>  
>
Yes, this is probably the biggest problem with runtime patching.  In the 
systems I used, we generally took images of the system with the patches 
or we kept applied patches in a directory and applied them all when the 
system restarted.

The other big problem is that a very good patching system tends to get 
"abused" and used to install major updates, major features, etc.  Things 
it is really not intended for.

I don't think this is intended for general systems.  I wouldn't want to 
see it on workstations.  But it is useful in very controlled environments.

-Corey

> 
>    Tim
>
>  
>
>>-Corey
>>
>>Ralf Flaxa wrote:
>>
>>    
>>
>>>Speaking for SUSE/Novell I can at least say that live patching is evil
>>>and would never be considered supported. How shall you guarantee support
>>>or certification with such a mechanism in place?
>>>
>>>	Ralf
>>>
>>>On Mon, Mar 28, 2005 at 11:14:50AM +0900, Takashi Ikebe wrote:
>>> 
>>>
>>>      
>>>
>>>>The following is a use case for a Live patching.  This
>>>>addresses AVL10.0 Live patching on CGL Specification 3.0.
>>>>Please feel free to comment / suggestion.
>>>>
>>>>Takashi.
>>>>-----------------------------------------------------------------------------------------------
>>>>Description
>>>>OSDL CGL specifies that carrier grade Linux shall provide the mechanism
>>>>for dynamically replacing the symbols of a running process without
>>>>restarting. Dynamic replacement of symbols allows a process to access
>>>>patched functions or values without restarting and can improve process
>>>>availability.
>>>>
>>>>Desired Outcome
>>>>Mainline kernel acceptance or distro acceptance
>>>>
>>>>Participants/Roles
>>>>System administrators setup the requirements on installations. On
>>>>operation, system administrators apply patch with the requirement.
>>>>
>>>>Scenarios
>>>>On operation, system administrators apply patch with the requirement as
>>>>following scenario;
>>>>1.System administrators make patch file from diff file or new version's
>>>>source code.
>>>>2.System administrators load patch to the process with provided live
>>>>patch tool.
>>>>3.System administrators activate patch to the process with provided live
>>>>patch tool.
>>>>4.Confirm that the patch is correctly applied or not.
>>>>
>>>>Implementation Notes
>>>>The requirement need to have following functions;
>>>>- The function which loads the patch file to target process's memory area.
>>>>- The function which overwrites the branch operation code to the patch,
>>>>on the entry point of  target process's functions which wants to fix by
>>>>patch.
>>>>- The function which restores overwritten branch code.
>>>>- The function which unload the patch files.
>>>>Through above functions, the requirement realize on-line patch to target
>>>>process.
>>>>The requirement need to provide on-line patch even if the process is
>>>>multi-thread model process, or environment is SMP, and stop time of
>>>>target process should not over  100 milliseconds.
>>>>
>>>>References
>>>>Pannus project: http://pannus.sourceforge.net/
>>>>Live patching implementation:
>>>>http://prdownloads.sourceforge.net/pannus/pannus_en.pdf
>>>>
>>>>
>>>>-- 
>>>>Takashi Ikebe
>>>>NTT Network Service Systems Laboratories
>>>>9-11, Midori-Cho 3-Chome Musashino-Shi,
>>>>Tokyo 180-8585 Japan
>>>>Tel : +81 422 59 4246, Fax : +81 422 60 4012
>>>>e-mail : ikebe.takashi at lab.ntt.co.jp
>>>>   
>>>>
>>>>        
>>>>