[cgl_discussion] [Fwd: [Dcl_tech_board] Linux Kernel Crash Du mp (LKCD) evaluation]

Chen, Terence terence.chen at intel.com
Mon Apr 14 08:28:23 PDT 2003


In terms of netdump over LKCD. Mohamed Abbas has implemented it as part of
CGL 1.1. He integrated it back to LKCD project and has been tested with CGL
test suite as part of MV CGLE release.

John, you might want to talk to Steve and see if the work is done or can be
re-use.

-Terence

> -----Original Message-----
> From: John Cherry [mailto:cherry at osdl.org]
> Sent: Friday, April 11, 2003 5:25 PM
> To: cgl_discussion at osdl.org
> Subject: [cgl_discussion] [Fwd: [Dcl_tech_board] Linux Kernel 
> Crash Dump
> (LKCD) evaluation]
> 
> 
> Steve Hemminger (of DCL fame) wrote up a nice synopsis of the Linux
> Kernel Crash Dump (LKCD) project and where it is going.  There are a
> number of reasons that LKCD is not gaining mainline acceptance
> (dependency on kexec, arch specific, too invasive, bloat, 
> etc.).  Linus
> has stated that he would consider a network-only crash dump, 
> but that is
> not where LKCD is heading.
> 
> Steve is proposing a mini crash dump project that has a chance of
> mainline acceptance.  This would be beneficial to DCL, CGL, and the
> community at large.
> 
> Please give Steve your feedback on this proposal.  I know 
> there has been
> some work going on with network dumps, so if you feel that an existing
> project might be a good baseline for the mini crash dump 
> project, please
> sync up with Steve.
> 
> Thanks,
> John
> 
> -----Forwarded Message-----
> 
> > From: Stephen Hemminger <shemminger at osdl.org>
> > To: dcl_tech_board at osdl.org, dcl_steering at osdl.org
> > Subject: [Dcl_tech_board] Linux Kernel Crash Dump (LKCD) evaluation
> > Date: 11 Apr 2003 16:34:52 -0700
> > 
> > 	Linux Kernel Crash Dump Evaluation
> > 
> > Crash dump is an important diagnostic tool for production 
> systems. Commercial
> > customers rely on binary distributions from vendors; these 
> vendors need tools
> > like crash dumps to provide timely support.
> > 
> > A version of crash dump was ported by SGI and it became the 
> Linux Kernel
> > Crash Dump (LKCD) project.  This code has failed to gain 
> community acceptance. 
> > 
> > 
> > Kernel acceptance
> > 
> > Full text of discussion from Oct '02 in Addendum's.  The 
> last attempt
> > to submit to kernel failed and Linus expressed the opinion that LKCD
> > is a "vendor-driven" thing [Linus1]. He seemed willing to 
> accept network
> > dump since net drivers fail less [Linus2]. Red Hat supplies network
> > dump, but has requirements for disk as well [Dave1].
> > 
> > OSDL can act to help facilitate a useful crash dump solution, which
> > aligns with the  "vendor-driven" perspective.  The problem is that 
> > existing LKCD project may not be the right mechanism.
> > 
> > LKCD team
> > 
> > The LKCD project is active with development for OSDL DCL member
> > companies. Individuals from IBM, Intel, and OSDL regularly 
> contribute
> > to the current CVS tree. The distro vendors (the real 
> customers) don't
> > seem to be involved.
> > 
> > Issues with LKCD
> > 
> > * Not heading towards closure
> > 
> > The size of LKCD has grown since Oct and got more complex.  Support
> > for kexec (save to memory) has been added as well as other changes.
> > It seems like the project has given up on getting it into 2.6.
> > 
> > Current additions to LKCD are all sound, but are not heading the
> > project towards integration in the standard kernel.  The most recent
> > is saving crash dump to reserved memory and saving it on 
> reboot.  This
> > makes sense on some machines with lots of memory but isn't something
> > that will end up getting used on average binary distribution vendor
> > support. 
> > 
> > The de-facto strategy is to try and target a smaller 
> solution based on
> > the memory dump.  This is not a bad concept, but means that 
> acceptance
> > is now dependent on Linus (and distro's) accepting the kexec patch.
> > 
> > Response from the mailing list to the suggestion of pared down crash
> > dump was positive from the active developers, but no action 
> has taken
> > place in that regard.
> > 
> > * Integration touches too many places
> > 
> > Far too many files in the main kernel need to be patched.  None of
> > these are big patches, but they hit "sensitive places"
> >  1. Scheduler needs change to main loop to allow other 
> CPU's to dump.
> >  2. VM needs additional flag to keep track of kernel memory usage.
> >  3. Makefile changes to get type information
> >  4. SMP IPI additions to capture other processor state.
> >  5. Extensions for reserving memory at boot.
> > 
> > * Current disk dump is unreliable
> >  
> > In order to dump to disk, it goes through the normal device block
> > layer which means LKCD must re-enable interrupts, and
> > re-schedule. Also, since disk drivers are often a source of failures
> > it risks double faulting by using the same code path.
> > 
> > * Non IA32 platform support missing on 2.5
> > 
> > Since it is a side project, no one has updated LKCD to work on
> > non-i386 kernel.  Also since so many places get touched in the main
> > kernel it is a non-trivial port.
> > 
> > * LKCD interface
> > 
> > Interface is through /dev/dump. Linus doesn't like 
> pseudo-devices and
> > prefers /proc and eventually /sysfs for such things. There 
> is a /proc
> > interface to LKCD but the utilities use ioctl's on /dev/dump.
> > 
> > * Bloat
> > 
> > LKCD supports a plethora of options about dump devices, 
> compression types, 
> > how much memory to dump, ... This leads to LKCD being 
> tagged as bloat. 
> > If LKCD is to work on customer installed systems, it has to 
> have a simple setup.
> > 
> > Suggested alternative
> > 
> > Start a project to create a mini-crash dump that has a chance of
> > acceptance.  . 
> > 
> >  * Address the basic requirements of a binary enterprise
> > 	system distribution.
> > 
> >  * Use existing code if possible
> > 	- Network only crash dump 
> > 	- Rusty's IDE mini-oopser
> > 
> >  * Use existing dump format to save rewriting analysis tools 
> > 
> > 
> > Addendum
> > ============================================================
> > 
> > [Linus1]
> > 
> > From: Linus Torvalds <torvalds at transmeta.com>
> > To: "Matt D. Robinson" <yakker at aparity.com>
> > cc: Rusty Russell <rusty at rustcorp.com.au>, 
> <linux-kernel at vger.kernel.org>, 
> <lkcd-general at lists.sourceforge.net>, 
> <lkcd-devel at lists.sourceforge.net>
> > Subject: [lkcd-general] Re: What's left over.
> > Date: Thu, 31 Oct 2002 07:46:08 -0800 (PST)
> > Sender: lkcd-general-admin at lists.sourceforge.net
> > 
> > 
> > On Wed, 30 Oct 2002, Matt D. Robinson wrote:
> > 
> > > Linus Torvalds wrote:
> > > > > Crash Dumping (LKCD)
> > > > 
> > > > This is definitely a vendor-driven thing. I don't 
> believe it has any
> > > > relevance unless vendors actively support it.
> > > 
> > > There are people within IBM in Germany, India and 
> England, as well as
> > > a number of companies (Intel, NEC, Hitachi, Fujitsu), as 
> well as SGI
> > > that are PAID to support this.
> > 
> > That's fine. And since they are paid to support it, they 
> can apply the 
> > patches.  
> > 
> > What I'm saying by "vendor driven" is that it has no 
> relevance for the 
> > standard kernel, and since it has no relevance to that, 
> then I have no 
> > incentives to merge it. The crash dump is only useful with 
> people who 
> > actively look at the dumps, and I don't know _anybody_ 
> outside of the 
> > specialized vendors you mention who actually do that.
> > 
> > I will merge it when there are real users who want it - usually as a
> > result of having gotten used to it through a vendor who 
> supports it. (And
> > by "support" I do not mean "maintain the patches", but 
> "actively uses it"
> > to work out the users problems or whatever).
> > 
> > Horse before the cart and all that thing.
> > 
> > People have to realize that my kernel is not for random new 
> features. The
> > stuff I consider important are things that people use on 
> their own, or
> > stuff that is the base for other work. Quite often I want 
> vendors to merge
> > patches _they_ care about long long before I will merge 
> them (examples of
> > this are quite common, things like reiserfs and ext3 etc).
> > 
> > THAT is what I mean by vendor-driven. If vendors decide 
> they really want
> > the patches, and I actually start seeing noises on 
> linux-kernel or getting
> > requests for it being merged from _users_ rather than 
> developers, then
> > that means that the vendor is on to something.
> > 
> > 		Linus
> > -----------------------------------
> > [Linus2]
> > 
> > From: Linus Torvalds <torvalds at transmeta.com>
> > To: "Matt D. Robinson" <yakker at aparity.com>
> > cc: Rusty Russell <rusty at rustcorp.com.au>, 
> <linux-kernel at vger.kernel.org>, 
> <lkcd-general at lists.sourceforge.net>, 
> <lkcd-devel at lists.sourceforge.net>
> > Subject: [lkcd-general] Re: What's left over.
> > Date: Thu, 31 Oct 2002 09:25:21 -0800 (PST)
> > Sender: lkcd-general-admin at lists.sourceforge.net
> > 
> > 
> > [ Ok, this is a really serious email. If you don't get it, 
> don't bother 
> >   emailing me. Instead, think about it for an hour, and if 
> you still don't 
> >   get it, ask somebody you know to explain it to you. ]
> > 
> > On Thu, 31 Oct 2002, Matt D. Robinson wrote:
> > > 
> > > Sure, but why should they have to?  What technical reason is there
> > > for not including it, Linus?
> > 
> > There are many:
> > 
> >  - bloat kills:
> > 
> > 	My job is saying "NO!"
> > 
> > 	In other words: the question is never EVER "Why shouldn't it be
> > 	accepted?", but it is always "Why do we really not want to live 
> > 	without this?"
> > 
> >  - included features kill off (potentially better) projects.
> > 
> > 	There's a big "inertia" to features. It's often better to keep 
> > 	features _off_ the standard kernel if they may end up being
> > 	further developed in totally new directions.
> > 
> > 	In particular when it comes to this project, I'm told about
> > 	"netdump", which doesn't try to dump to a disk, but 
> over the net.
> > 	And quite frankly, my immediate reaction is to say "Hell, I
> > 	_never_ want the dump touching my disk, but over the network
> > 	sounds like a great idea".
> > 
> > To me this says "LKCD is stupid". Which means that I'm not 
> going to apply 
> > it, and I'm going to need some real reason to do so - ie 
> being proven 
> > wrong in the field.
> > 
> > (And don't get me wrong - I don't mind getting proven 
> wrong. I change my 
> > opinions the way some people change underwear. And I think 
> that's ok).
> > 
> > > I completely don't understand your reasoning here.
> > 
> > Tough. That's YOUR problem.
> > 
> > 		Linus
> > -----------------------------------
> > [Dave1]
> > From: Dave Anderson <anderson at redhat.com>
> > To: Linus Torvalds <torvalds at transmeta.com>
> > CC: "Matt D. Robinson" <yakker at aparity.com>, Rusty Russell 
> <rusty at rustcorp.com.au>, linux-kernel at vger.kernel.org, 
> lkcd-general at lists.sourceforge.net, lkcd-devel at lists.sourceforge.net
> > Subject: [lkcd-general] Re: What's left over.
> > Date: Thu, 31 Oct 2002 15:59:34 -0500
> > Sender: lkcd-general-admin at lists.sourceforge.net
> > X-Mailer: Mozilla 4.78 [en] (X11; U; Linux 
> 2.4.9-e.3.genterprise i686)
> > 
> > 
> > On Thu, 31 Oct 2002, Linus Torvalds wrote:
> > 
> > >  - included features kill off (potentially better) projects.
> > >
> > >         There's a big "inertia" to features. It's often 
> better to keep
> > >         features _off_ the standard kernel if they may 
> end up being
> > >         further developed in totally new directions.
> > >
> > >         In particular when it comes to this project, I'm 
> told about
> > >         "netdump", which doesn't try to dump to a disk, 
> but over the net.
> > >         And quite frankly, my immediate reaction is to 
> say "Hell, I
> > >         _never_ want the dump touching my disk, but over 
> the network
> > >         sounds like a great idea".
> > >
> > > To me this says "LKCD is stupid". Which means that I'm 
> not going to apply
> > > it, and I'm going to need some real reason to do so - ie 
> being proven
> > > wrong in the field.
> > >
> > > (And don't get me wrong - I don't mind getting proven 
> wrong. I change my
> > > opinions the way some people change underwear. And I 
> think that's ok).
> > 
> > It would be most unfortunate if the existance of netdump is 
> used as a
> > reason to deny LKCD's inclusion, or to simply dismiss LKCD 
> as stupid.
> > 
> > On Thu, 31 Oct 2002, Matt D. Robinson wrote:
> > 
> > > We want to see this in the kernel, frankly, because it's a pain
> > > in the butt keeping up with your kernel revisions and everything
> > > else that goes in that changes.  And I'm sure SuSE, 
> UnitedLinux and
> > > (hopefully) Red Hat don't want to spend their time having to roll
> > > this stuff in each and every time you roll a new kernel.
> > 
> > While Red Hat advocates Ingo's netdump option, we have customer
> > requests that are requiring us to look at LKCD disk-based 
> dumps as an
> > alternative, co-existing dump mechanism.  Since the two 
> methods are not mutually
> > exclusive, LKCD will never kill off netdump -- nor 
> certainly vice-versa.  We're
> > all just looking for a better means to be able to
> > provide support to our customers, not to mention its value as a
> > development aid.
> > 
> > Dave Anderson
> > Red Hat, Inc.
> > 
> > 
> > 
> > _______________________________________________
> > Dcl_tech_board mailing list
> > Dcl_tech_board at lists.osdl.org
> > http://lists.osdl.org/mailman/listinfo/dcl_tech_board
> 
> _______________________________________________
> cgl_discussion mailing list
> cgl_discussion at lists.osdl.org
> http://lists.osdl.org/mailman/listinfo/cgl_discussion
> 



More information about the cgl_discussion mailing list