[cgl_discussion] [Fwd: Summary of the Multi-Path BOF at OLS and future directions]

Steven Dake sdake at mvista.com
Tue Aug 5 10:23:17 PDT 2003

For those of you not monitoring the linux-scsi mailing list, here is a
summary on linux-scsi about multipath BOF at OLS

-----Forwarded Message-----
> From: James Bottomley <James.Bottomley at steeleye.com>
> To: SCSI Mailing List <linux-scsi at vger.kernel.org>
> Subject: Summary of the Multi-Path BOF at OLS and future directions
> Date: Mon, 04 Aug 2003 20:54:55 -0700
> Hi All,
> For those of you who couldn't attend OLS, I thought a short summary of
> what went on might be useful.
> Multi-path was a hot topic throughout both the Kernel Summit and OLS.
> Thing began with a requirement inputs panel of vendors identifying
> multi-path as one of their primary problems.  Followed by an invited
> discussion with Lars Marowski-Brée and Mike Anderson on multi-path. At
> OLS, there was a paper presentation by Mike and Patrick Mansfield on the
> IBM SCSI layer multi-pathing solution and finally there was the BOF
> session which tried to pick a way forwards for us in 2.6/2.7
> What I'd like to summarise is what I think the conclusions we reached
> are:
> 1. Multi-path is relevant to more layers of the I/O stack than just
> SCSI. Thus, it makes sense to do it at the layer just above bio.  This
> would either be md/multipath or the Device Mapper multi-path module.
> 2. Doing multi-path at that level is not easy without fast failure
> indications.
> 2a. On discussion of this, it was decided that on each bio/request, the
> upper layers would like to indicate which failures they wish to be fast
> and which they wish not to know about.  The two principle ones were
> transport errors (relevant to multi-path) and medium errors (relevant to
> software raid).
> 2b. Upwards, on fast failure, we would send back the raw sense data
> (probably encoded in the sense request) plus a translated indication of
> what the problem was.  The translations would probably be a combination
> of (fatal|retryable) and (driver error (card out of
> resources/failure)|transport error|medum error).
> 3.  It was noted that symmetric active multi-path in this scheme is not
> possible without the ability to place a proper elevator above the
> multi-pathing driver (and have a simple queue only noop elevator
> below).  This should help alleviate the current fragmentation issues
> where symmetric active multi-path produces I/O in decidedly non-optimal
> page sized chunks.
> 4. Configuration of this solution would be extremely important.  The
> idea here is to rely on the udev solution currently making its way into
> the kernel and essentially have a vendor specific multi-path
> configuration as a udev plug-in.
> 5. Vendor value add for specific devices could be encoded both as
> configuration (udev) pieces and plug-ins to the upper layer multi-path
> driver to activate any proprietary vendor specific configuration options
> that may be needed for specific solutions.
> 6. Ownership.  This wasn't exactly discussed, but in light of the
> problems with even SCSI-3 reservations, it is becoming clear that
> storage ownership in a multi-path configuration is getting impossible to
> maintain from user level.  Therefore, I at least will be giving thought
> to an ownership API that could be used to manage storage ownership from
> the kernel in the face of path fail overs.
> As far as the beginnings of implementation go, we already have
> md/multi-path.  Joe Thorber of Sistina will shortly be releasing the
> code to do multi-path over the device mapper interface, and our trusty
> block layer maintainer, Jens Axboe, has done the skeleton of a fast fail
> infrastructure for us (in 2.6.0-test2).  The attached patch should add
> the fast fail capability to SCSI (although without the upwards/downwards
> failure indications) and we should be able to build the rest of the
> infrastructure on this framework.
> As far as errors and omissions go, I found KS/OLS to go rather fast and
> be a bit blurry, so hopefully those who were also present can chime in
> on this thread to amplify/correct the points I actually managed to grasp
> and summarise the ones I missed.
> Thanks,
> James

More information about the cgl_discussion mailing list