[cgl_discussion] security review of Clustering spec
joseph.cihula at intel.com
Fri Oct 8 10:48:21 PDT 2004
> I've looked at the Clustering spec from a security point of view and
> have the following comments/questions (some of which are not security
> related, however):
> Sec 5.1 Service Availability Forum (SA Forum) APIs:
> The SAF APIs have no security. That is, any application on any system
> with an AIS service provider and connectivity to a node in the cluster
> can use the APIs to manipulate the cluster and the cluster-aware
> applications running in it. At a finer granularity, any cluster-aware
> application that is compromised (i.e. by a buffer overflow) can not
> only compromise all other cluster instances of itself but can
> compromise all cluster applications.
> I realize that there is nothing (directly) that CGL can do about this
> and so I don't propose any changes to the requirements. However, it
> is worth understanding this vulnerability. And if any customers or
> spec authors have contacts within SAF, they might mention that it
> would be desirable for SAF to add security to a future version of
> their APIs.
> CFH.2.1 Cluster Node Failure Detection:
> This (and subsequent requirements) doesn't define the term "failure".
> There could be many types of failure, from application to network
> stack to hardware. I don't expect this to have security implications,
> as there is probably not much that could be done to prevent malware
> from either making a node seem to have failed or making a compromised
> node seem to still be available.
> CFH.3.1 Prevent Failed Node From Corrupting Shared Resources:
> [not a security comment] Given the broad nature of node failure, it's
> not clear to me that there is any way to guarantee that a failing node
> won't corrupt shared resources before it is isolated. Perhaps this
> requirement is really trying to specify that a failed node cannot deny
> access or service to shared resources? That would be more doable.
> CFH.5.1 Application Fail-over Enabling:
> [not a security comment] This requirement doesn't mentioned whether
> failover includes application state (checkpoint in SAF AIS terms) and,
> if it does, what the freshness of the data must be.
> CSM.6.1 Cluster Synchronized Device Hotswap:
> A requirement should be added that specifies that any security
> policies and/or parameters (e.g. access control lists, etc.) that
> apply to the class of device being hot-added must be applied in all
> OSes that will have access, and to the device itself, before that
> device is made available for use.
> There should also be a requirement that any sensitive state (that has
> not been explicitly persisted) in a device should be cleared before
> the device can be removed. This might be an authorization value that
> has been cached, etc. I would expect loss of power to clear most
> things, but there may be cases where some state is not lost with power
> removal and there may be cases where the power is not lost and the
> device is simply deleted and then re-added.
> CCM.2 Cluster Communication Service:
> I don't think that there should be a requirement around endpoint
> authentication or identification, but it would be something to think
> about and maybe roadmap. That is, the ability to authoritatively
> identify the originator and destination of a message as a specific
> cluster member (machine, app, etc.), perhaps including the ability to
> create access control lists or other security mechanisms on top of
> this. Such a building block would facilitate securing the SAF APIs.
> This would also have to satisfy the transparency requirements of
> CAF.2.2 / CAF2.3. IP Takeover / TCP Session Takeover:
> I would expect that this mechanism would not support failover of
> secure connections such as IPSec or TLS/SSL? If it is the intention
> to support this type of failover then it needs to be recognized that
> there is data state (i.e. session key, etc.) that must be transferred
> to the redundant node in order for it to successfully take over the
> connection. In the case of SSL/TLS, this might be considered
> application state, in that the SSL/TLS code is part of the application
> binary/shared lib. It would be useful for this requirement (or its
> parent requirement) to state one way or another about stateful
> connection failover.
> CCS.2.1 Data Replication Performance:
> [not a security comment] Is it really the case that the checkpoint
> write time can be independent of the number of replicas, given that
> the write throughput is defined to be synchronous to all replica
> updates? It also seems like the read and write throughput would
> depend on the number of total requests being made to a given replica
> or original. So is the "500 API executions" meant to mean 500
> executions across the entire cluster, on a single node, or by a single
> application instance?
> Section 3.3 Cluster Management:
> It would be very desirable, from a security perspective, to add
> requirement(s) that all remote management must be secure
> (authenticated, authorized, and audited). That said, given the
> underlying remote management technologies I'm not sure that it would
> be realistic for CGL 3.0. So I think that maybe such a requirement
> could be made a P2 or roadmap item so that it is kept around for the
> CMON.1.1 Cluster Node HW Status Monitoring:
> I am confused by the fact that SAF HPI is not a remote management
> protocol and yet this requirement is stating that CGL should be able
> to use HPI to manage remote (i.e. cluster members) nodes. Is this
> implicitly assuming an implementation that extends HPI to support
> remote management (e.g. OpenHPI)?
> The SAF HPI (B) specification does not support any security (the
> security parameter to the open session call must be NULL). As HPI is
> a local management specification by default, this is equivalent to the
> lack of physical security in IPMI and in general. However, the
> OpenHPI implementation supports remote management via plugins that
> support remoteable management protocols such as IPMI and SNMP. If
> such an implementation is expected to be used, it should be required
> that any security support in that protocol be exposed to management
> clients and used in a secure fashion. For instance, any
> authentication information must be entered by a user (as opposed to
> stored in a file or hardcoded).
> CDIAG.2 Cluster-wide Diagnostic Info:
> CDIAG.2.1 Cluster-wide Identified Core Dump:
> CDIAG.2.2 Cluster-wide Crash Dump Management:
> The ability to retrieve core and crash dumps remotely should not have
> any weaker security than that to retrieve them locally (security spec
> TBD?). This may introduce an additional requirement if/when core and
> crash dump security is specified.
> CDIAG.2.3 Cluster -wide Log Collection:
> An additional requirement may be needed when the security spec
> specifies the security for logging, as cluster logs should be equally
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the cgl_discussion