Considerations for performing cluster validation on an existing cluster
When you perform cluster validation on an already configured cluster, you might not always run all tests. If you include storage tests in the set of tests you run, there are different considerations to keep in mind than if you do not include storage tests. This section outlines the main considerations:
Considerations when including storage tests: When cluster validation is performed on an already configured cluster, if the default tests (which include storage tests) are selected, only disk resources that are in an Offline state or are not assigned to a clustered service or application will be used for testing the storage. This builds in a safety mechanism, and the cluster validation wizard warns you when storage tests have been selected but will not run on storage in an Online state, that is, storage used by clustered services or applications. This is by design to avoid disruption to highly available services or applications that depend upon these disk resources being online.
One scenario where Microsoft CSS may request you to run validation tests on production clusters is when there is a cluster storage failure that could be caused by some underlying storage configuration change or failure. By default, the wizard warns you if storage tests have been selected but will not be run on storage that is online, that is, storage used by clustered services or applications. In this situation, you can run validation tests (including storage tests) by creating or choosing a new logical unit number (LUN) from the same shared storage device and presenting it to all nodes. By testing this LUN, you can avoid disruption to clustered services and applications already online within the cluster and still test the underlying storage subsystem.
If a failover cluster passed the full set of validation tests and has no future hardware or software changes, then it will continue to be a supported configuration. However, when you perform routine updates to software components such as drivers and firmware, it may be necessary to re-run the validation wizard to ensure that the current configuration of the failover cluster is supported. The following guidelines can help in this process:
All components of the storage stack should be identical across all nodes in the cluster. It is required that multipath I/O (MPIO) software and Device Specific Module (DSM) software components be identical. It is recommended that the mass-storage device controllers—that is, the host bus adapter (HBA), HBA drivers, and HBA firmware—that are attached to cluster storage be identical. If you use dissimilar HBAs, you should verify with the storage vendor that you are following their supported or recommended configurations.
To minimize impact to highly available applications and services, a best practice is to keep a small LUN available to allow the validation wizard to run tests on available storage without negatively impacting clustered services and applications. This way, if Microsoft CSS requests you to run a full set of cluster validation tests, the wizard will follow the default behavior and run tests on the available storage (the new LUN only).
Considerations when not including storage tests: System configuration tests, inventory tests, and network tests have very low overhead, and can be performed without significant effect on servers in a cluster.
Microsoft CSS may request you to run the cluster validation on a production cluster as part of normal troubleshooting procedures (not focused on storage). In this scenario, you will use the wizard to inventory hardware and software, perform network testing, and validate system configuration. There may be certain scenarios in which only a subset of the full tests are needed. For example, if troubleshooting a problem with networking on a production cluster, Microsoft CSS may request that you run only the hardware and software inventory and the network tests.




