[cgl_discussion] FW: PoC project of Software ECC [cgl_specs] Two questions about SWECC (fwd)

Eric.Chacron at alcatel.fr Eric.Chacron at alcatel.fr
Wed Jun 18 02:18:07 PDT 2003


>The intent is tha latter, in that the OS needs to be able to know ECC
>errors have occured.  This is to allow handlers to be written to react
>to it.  We don't specify the actions to be taken, merely the capability
>to get the errors.


Who's going to write the handlers ?
I mean who's going to write a NMI handler that will perform a panic
whenever
a multiple bit error is detected and is uncorrectable ?
I think this is still not covered today and must be added.
So we have
1) to detect and log or notify ECC (corrected in general).
2) to react upon uncorrectable error detection


Eric






"Fei, Fei" <fei.fei at intel.com>@lists.osdl.org on 06/18/2003 05:12:22 AM

Please respond to "Fei, Fei" <fei.fei at intel.com>

Sent by:    cgl_discussion-admin at lists.osdl.org


To:    Lynch Rusty <rusty.lynch at intel.com>, Julie N Fleischer
       <julie.n.fleischer at intel.com>
cc:    cgl_discussion <cgl_discussion at osdl.org>
Subject:    [cgl_discussion] FW: PoC project of Software ECC [cgl_specs]
       Two questions about SWECC (fwd)


Rusty/Julie,

Judging from the mail forwarded, the PoC project should not be RSCode.
During my evaluation of this item, the Linux ECC project is more
reasonable.

Project URL: http://www.anime.net/~goemon/linux-ecc/
The mailing list is hosted on Yahoo! Group,
http://groups.yahoo.com/group/ecc/



---------------------------------
I speak for myself, not for Intel
---------------------------------

-- Fei, Fei --
Intel China Software Lab

---------- Forwarded message ----------
Date: Sat, 14 Jun 2003 03:57:23 +0800
From: Peter Badovinatz <tabmowzo at us.ibm.com>
To: "Fei, Fei" <fei.fei at intel.com>
Cc: cgl_specs at osdl.org
Subject: Re: [cgl_specs] Two questions about SWECC


"Fei, Fei" wrote:
>
> Hi,
>
> I am seeing SWECC (item 3.ecc) in the spec and I have some questions:
>
> 1) For what purpose this item is for? I mean I am not clear the scenario
> of this SWECC thing. From the spec, I guessed a list as below:
>         a) This is a memory check tool/daemon. It can check memory error
> which has no ECC feature.
>         b) This is a library thing. Applications, like real-time
> multimedia stream ones, can call them if they want it.
>         c) This is an ECC error handling thing. Memory ECC errors can be
> reported to log event system so that SWECC can detect them and take
> actions.
> So which one is right? Or it should totally be a different story.

The intent is tha latter, in that the OS needs to be able to know ECC
errors have occured.  This is to allow handlers to be written to react
to it.  We don't specify the actions to be taken, merely the capability
to get the errors.
>
> 2) Is Reed-Solomon ECC algorithm the only one algorithm or just a
> preferred choice? I mean if there are several algorithms should be
> included, an ECC framework should be involved.

The discussion about the Reed-Solomom algorithm is being removed, it
will not appear in the next draft of the document.  We express no
specific choice.
>
> Fei, Fei
> I speak for myself, not for Intel.
>
> Here is SWECC spec part:
> --------------------------------
> 3.ecc Software ECC Support
> Production Availability
> Support error correction codes to allow hardware error correction for
> detecting and/or recovering from memory errors. Detect and report a
> single-bit ECC error from the memory subsystem. Detect in the operating
> system if a multi-bit ECC error occurred from the memory subsystem.
>
> One example of such algorithms are Reed-Solomon codes, which provide
> convenient 'byte-sized' block coding which is convenient for adding
> protection to data which is stored as eight-bit bytes (i.e., most common
>
> computer data).
>
> The Reed-Solomon code is the same one used for encoding of data on Audio
>
> CD's and CD-ROM disks, as well as many magnetic and optical disk
> controllers. You basically want to use Reed-Solomon coding in any
> situation where "forward error correction" is needed, i.e., the decoder
> will not have the option of requesting retransmission of bad blocks.
>
> There are many ways you might use error correction coding, such as a
> high-reliability layer on top of a real-time streaming audio protocol
> which is implemented atop an unreliable protocol such as UDP.
>
> 3.ecc POC Referrals:
>
> Reference projects:
> A Reed-Solomon error-correcting encoder/decoder library:
> http://rscode.sourceforge.net/.
> The Error Correcting Codes (ECC) Site: http://www.eccpage.com/.
>
> >From Randy and Eric:
> regarding ECC support, the draft refers to ECC Forum. I googled "+ecc
> +forum" and didn't find anything that is related to Linux.
> However, searching for "+ecc +driver" finds these:
> Linux ECC: http://www.anime.net/~goemon/linux-ecc/
> http://h18000.www1.hp.com/products/servers/linux/linuxhealth.html
> I'm guessing that the first one is the "known" one for Linux. However,
> I'm
> still looking also...  ecc at yahoogroups.com is the mailing list address.
> The list is still running. But the main page is not accessible ( i have
> tried to contact the manager without success).
>
> Thanks,
> -Fei

Peter
--
Peter R. Badovinatz aka 'Wombat' -- IBM Linux Technology Center
preferred: tabmowzo at us.ibm.com / alternate: wombat at us.ibm.com
These are my opinions and absolutely not official opinions of IBM, Corp.

)


_______________________________________________
cgl_discussion mailing list
cgl_discussion at lists.osdl.org
 http://lists.osdl.org/mailman/listinfo/cgl_discussion








More information about the cgl_discussion mailing list