Why IBM SVC and EMC VPLEX are not the same.
Posted on August 5, 2012 by TheStorageChap in VPLEX
Whilst EMC VPLEX and IBM SAN Volume Controller (SVC) may, after a cursory glance, appear to be similar technologies that have similar use cases it is not so black and white, the reality is that these solutions were conceived with very different aims in mind and with very different architectures.
IBM SVC
IBM SVC was designed as a single site storage virtualisation solution that enabled less capable storage arrays to be pooled behind the virtualisation architecture and enabled with additional write caching and non-disruptive data mobility features. IBM SVC development has more recently added features such as auto-tiering and thin provisioning, transforming SVC into a storage controller type solution with embedded features (e.g. V7000), and a standalone solution that can enable storage commoditisation use cases across heterogeneous storage arrays.
The architecture is based on I/O groups of two SVC Nodes in an active/passive state, with an SVC Cluster consisting of up to four I/O Groups and eight SVC Nodes. An SVC Cluster can be configured in a ‘split node’ architecture where the two Node I/O group is split across physically separate sites using fibre channel and ISLs.

EDIT 25.10.2012: In fairness to a comment posted by IBM I have added the picture below specifically depicting an IBM Split Node SVC solution.

EMC VPLEX
EMC’s heritage and its position in the market place has been based on the development of storage platforms and inherent software features such as thin-provisioning, snaps, clones and FAST within those platforms. EMC VPLEX was conceived as a storage virtualisation and multi-site federation solution that could be used to further augment those existing technologies and enable greater levels of both data and application, availability and mobility.
An EMC VPLEX Cluster consists of up to eight VPLEX Directors participating in a single I/O group. Each Director has four front-end and four back-end fibre channel ports with completely separate connectivity for inter-Director, inter-Engine and inter-Site communication. A completely separate, resilient VPLEX cluster can be created within a second datacentre and connected to the VPLEX Cluster within the first datacentre over either Fibre Channel or Ethernet with no requirement for merged fabrics. Once connected VPLEX enables the creation of local and distributed virtual volumes and in the case of the later, these volumes are concurrently read/write accessible across both datacentres in a true active/active architecture in which resources from both datacentres are utilised for production I/O.
Whilst VPLEX Local does not necessarily have all of the storage commoditisation features associated with IBM SVC, it does enable the key virtualisation use cases, of storage abstraction, storage pooling, data mobility and storage mirroring. For storage commoditisation use cases, EMC proposes the use of Federated Tiered Storage to enable the use of Tier 1 EMC storage software features across third-party arrays; which can then also, if required, additionally benefit from the use of VPLEX.

| EMC VPLEX Local | IBM SVC |
| All storage array, features and functionality are available behind VPLEX. | SVC eliminates the ability to use back-end array features and functionality, due to write caching. |
| VPLEX is an active/active architecture where a virtual volume can be active across multiple VPLEX Directors. | SVC is an active /passive two Node I/O Group where a volume can only be active from one SVC Node. |
| VPLEX maintains backend active/active array functionality. | SVC forces an active/active backend array into an active/passive solution, due to active/passive SVC Nodes. |
| VPLEX Directors can be added non-disruptively to an existing VPLEX Cluster and the new resources (e.g. read cache, bandwidth, IOPs) used across the Cluster. | I/O Groups are discreet silos and free resources in one I/O group cannot be leveraged in another. With SVC 6.4 IBM claims non-disruptive mobility between IO groups. This is not 100% accurate as this feature is only limited to Windows and Linux, does not support clustered servers and does not support VMware. Even in case of Windows they require a reboot. |
| VPLEX Directors have four front-end and four back-end ports that are dedicated to front-end and back-end I/O. | An SVC Node has only four HBAs per Node and they require two HBAs for cache mirroring leaving with only 1 HBA per fabric for production. Losing one of the HBAs is equivalent to losing one complete fabric. |
| VPLEX can use RecoverPoint CDP to enable corruption protection of virtualised volumes to non-virtualised disks that can be used to recover from even in the event of a total loss of the virtualisation technology. Alternatively snap and clone functionality at the array level can be fully utilised as VPLEX does not do write caching. | Snaps and clones can be completed using SVC and are used to recover in case of an outage, but how do you recover in the case that the virtualisation layer is gone? All your snaps and clones are also gone resulting in a need to restore the data from tape. |
| Non-disruptive data mobility across heterogeneous storage arrays can be paused, stopped and backed out, without any risk of data loss and with little performance impact. | Data mobility can have performance impacts during the actual mobility phase and the method of migration can lead to data loss during the migration in the event of a storage or site failure. |
| VPLEX Metro | Split-Node SVC Cluster |
| Independent, resilient VPLEX Clusters are federated across datacentres, using fibre or IP with no requirement for merged fabrics. | Two Node I/O groups are physically ‘split’ between datacentres leading to reduced resiliency within the datacentre. Fabrics must be merged between datacentres and multiple ISLs must be used to dual path all hosts across both SVC Nodes of the I/O Group adding fabric complexity and cost. |
| When using VPLEX with the minimum requirement of two Directors within a cluster, the loss of one Director does not require I/O to be redirected to the remote site as there is local redundancy. | When used in a split-node configuration SVC forces a multi controller array with multiple redundancies into a two Node solution with no redundancy per site. Losing one Node is similar to losing a site, IOs will have to cross ISLs to the remote site. |
| A write to a distributed virtual volume from either site is written once to the remote site, irrespective of which location the production application was being run from. | For each write IO, each appliance will have to do one write to the local backend, one write to the remote SVC Node and one write to the remote array, this is at least one round-trip more than VPLEX does. In case you run production on the non-preferred site you have to write to the Primary Node, mirror cache, write to local array and write to remote array making it two round trips more than VPLEX. |
Henrik Gronberg says:
Post Author October 21, 2012 at 18:26You need to brush up on your SVC knowledge…your are incorrect in most of your statements…
TheStorageChap says:
Post Author October 21, 2012 at 18:32Hi Henrik, thanks for your comment. Perhaps it would be useful if you could elaborate? I am interested to hear your view on specifically what you think is incorrect.
Barry Whyte says:
Post Author October 23, 2012 at 08:13I have just been pointed to this blog post and it is almost complete garbage.
I will reply in full on my blog in the next day or so – but key points :
1. There is no SPOF at the SVC or storage side as you state in the diagrams. Either can fail, and the other can continue.
2. SVC is ACTIVE-ACTIVE with a preference (if used) The preference is there so we can try to get better cache hits by sending reads to the same node in preference, but we most certainly DO NOT turn an active-active controller into an active-passive one. Maybe you need to read IBM’s great Redbooks and get yourself on an IBM SVC course!
TheStorageChap says:
Post Author October 23, 2012 at 09:59Hi Barry, thanks for the professional courtesy! My article was based on first hand feedback from customers with IBM SVC that have moved or are moving to EMC VPLEX, and also from the IBM Redbooks which I agree are great resource and a credit to IBM.
P929 of the SVC 6.3 Redbook states..
“Like a standard volume, each mirrored volume is owned by one I/O Group with a preferred node. Thus, the mirrored volume goes offline if the whole I/O Group goes offline. The preferred node performs all I/O operations, which means reads as well as writes. The preferred node can be set manually.”
This sounds very much like an active-passive solution, when compared with a true active-active solution.
Claudiu Schmidt says:
Post Author October 25, 2012 at 11:08Hi Barry,
I look forward to seeing your blog on the above. I really hope to get the deep technical details that substantiate your above comments. I have some additional points where I would appreciate your statement either here or in your blog:
Comment 1: There is no SPOF in SVC
This is an interesting statement: Would IBM sell an array with a single controller? Would you then take this single controller array and have it replicate to another single controller array? I guess not as all IBM arrays have as a minimum, dual controllers. So why would you go this way with SVC? IBM SVC was designed as a local virtualization solution with dual controller redundancy. How did you manage to get to a two site virtualization solution? IBM took the easy approach and stretched the two controller solution across sites, leaving you with a single controller per site. How can you mention “no SPOF”? The SVC controller has several SPOF: CPU, cache.. and the most obvious one is the HBA: you only have a single 4 port HBA in this appliance. You can argue that you have no SPOF on the whole solution: this is correct, but you forget to mention the following: if you hit one of those SPOF all hosts having preferred path to that appliance will have to wait the host IO timeout until they all failover to the remote site. Now all production traffic will go over the ISL link having the additional latency and the additional bandwidth impact. How long can it take to replace a failed SVC appliance? This can take several hours at best, it could also be 1 day or more. During this time you are at risk of having DU of your complete production environment as everything is now running on one appliance with multiple single points of failure. With this solution you are downgrading the availability of your backend array and hosts to the availability of one IBM SVC appliance or even worst to the availability of one HBA. You can have multiple redundancies on the backend arrays : multiple controllers, multiple HBAs and ports….but would losing one SVC appliance not mean that you will lose all the local arrays connected to this SVC appliance? The point of the original post was that we do not have these same limitations with VPLEX: We have multiple directors working together as one entity, we have no SPOF not even with the minimum configuration of one engine; we have several HBAs in each engine and multiple ports in each engine. In case of a hardware failure we don’t failover to the remote site, and we still have a complete operational cluster at the remote site with multiple redundancies and no SPOF.
Now what about the SVC NDU process: my understanding, and please correct me if I am wrong, is that during a NDU one controller us updated and rebooted, followed by the other. This will mean that you will have to failover the complete production to the secondary site (again running on a SPOF there) having the complete production freeze for the host IO timeout value, and once the SVC node comes back you will have to perform another failover of the complete production environment having another freeze of the complete production environment. How can this be a non-disruptive upgrade?
Comment 2: SVC is Active-Active
It is interesting you mention the “great IBM Redbooks”, there is one where you are listed as one of the authors and in it, it says the following, “In general, the write is sent to the owner node (preferred node). However, SVC is Active/Active and accepts I/O requests that arrive at the non-preferred node. This requires additional processing and is avoided if possible.”
This is not an active/active solution, this is ALUA solution meaning you are accepting IOs to the non-owner controller but are redirecting IO to the owner controller for processing.
Additionally , one volume is only preferred on a single SVC node meaning, for example, multiple VMs on the same volume should be running on the same site with the preferred node to avoid a performance impact based on the above. Again there is no need for this type of restriction with VPLEX, every volume is read/write accessible from both VPLEX Clusters at the same time and always processed by the local VPLEX cluster.
In the same Redbook it mentions the following: “The data is discarded on the non-owner (partner) node, because reads are not expected to occur to non-owner nodes.” If SVC is a true active/active solution why are you not expecting any reads on the non-owner node?
If SVC is an active/active solution, and you mention the fact that you have preference just for the benefit of increasing read cache hits, how come only the preferred appliance is performing the backend mirroring? You are sending the mirrored IO to the remote appliance, but you then send the same IO from the preferred appliance to both the local and remote array. Again if SVC were truly an active/active cluster, there would be no need to send the writes again over the ISL, you would have written them from the local appliance, why would you need this extra bandwidth and complexity? At a minimum you would need double the bandwidth with SVC compared to VPLEX, this is because VPLEX is a true active/active solution and we don’t have any owner/non-owner notion on reads and writes and we will always process every front end as well as backend IO from the local VPLEX cluster.
I look forward to your response.
Todd says:
Post Author November 17, 2012 at 07:48@Claudiu,
There are a number of statements that you have made regarding the IBM SVC that are just incorrect.
Your first point: I agree with your point about not having a SPOF. That is very much correct, but the technical aspects of what you are referring to is incorrect:
a. SVC 2145-CG8 – 4x 8GB FC ports per SVC controller
b. hit the SVC appliance and the hosts have to wait: No, not at all, the SVC appliance has multipath paths to the target, in addition, IBM gives you the SDD (Multipath I/O driver for free), EMC PowerPath costs but you can use the Native Multipath I/O driver but performance may be impacted as a result – http://powerwindows.wordpress.com/2011/10/07/take-the-hipowerpath-vs-mpio-take-the-high-roadgh-road-and-get-more-peformance/ or http://www.emc.com/collateral/software/white-papers/h8180-powerpath-load-balancing-failover-wp.pdf
c. disconnecting one appliance means you loose all of your connections to the backend disk arrays: absolutely not, you can do rolling upgrades to the firmware and the system works fine. Nothing lost while processing of data continues
d. operational clusters – Yes, both EMC Vplex and IBM SVC continues to function
e. ALUA – Asymmetric logical unit access occurs when the access characteristics of one port may differ from those of another port. SCSI target devices with target ports implemented in separate physical units may need to designate differing levels of access for the target ports associated with each logical unit. This means that the I/O port groups help to differentiate the characteristics of a certain port when a failure occurs, this provides the failover intelligence when accessing a LUN, this does not say that the access to the LUN is not allowed, it says that when the characteristics of a port change, it uses intelliengence to switch to another port to allow for continued access to that LUN (i.e. multipathing).
f. I am reading the Red-book and it does not say anything about non-owner node, but I am reading the version 6.4
g. Send I/O to the preferred and the remote array – that is not correct, the disks or LUNs are presented to the SVC, the SVC identifies the disks as Mdisks (managed disks), it creates a pool of disks from the mDisks, the managed disks are combined together to create dynamic vDisks, those pool is carved up and presented to the external servers as vDisks or virtual LUNs. The physical aspect of it is based on in-band or out-of-band SAN design.
So I am not sure what you mean when you can the SVC is not active/active, the information you have provided maybe from another firmware version, I am not sure but it does not lend to the products capability.
However, I have read and have worked with the Vplex virtualization appliance from EMC. The metro-cluster configuration has a number of potential issues.
1. Split brain, the technology they use to mitigate this process is called Vplex Witness which is a VMware appliance they use on a management network to communicate with the various Vplex controllers. At this point, they are using an out of band solution to resolve an issue that they cannot resolve using the Vplex appliances
2. Preferred Site failure causes full data unavailability
3. If the preferred cluster fails with the an application running on it, the cluster fails and it has to be manually restarted in a metro-cluster configuration, this is done manually
4. If the BE is out of date, incomplete, problem with the rebuild process, this will suspend I/O regardless of preference, p.68 – http://www.emc.com/collateral/hardware/technical-documentation/h7113-vplex-architecture-deployment.pdf (VPLEX must act and suspend access to the distributed volume on one of the clusters).
5. Preferred Cluster can put together a non-zero RTO.
6. Also, the Vplex using write-through method but it has to get an acknowledgement from the actual disk array. So if the disk array is experiencing intermittent problems then the acknowledgement can take some time to get to the Vplex controllers, this process could increase teh processing time it takes to write to disk.
7. In addition, the lookup process involves looking in 1st. local cache, 2nd global cache and 3rd. physical disk array. This lookup process could take a large amount of time, instead of creating an index using algorithms to capture the information in a table that could be replicated across the various Vplex clusters like that of a forwarding table used in ethernet switches. This lack of planning could cause additional time in processing
8. The EMC Vplex cluster costs about $80K to 100K more for a solution that does not have the features of the IBM SVC:
a. Easy Tier (HSM)
b. No Management server needed
c. No Internal fabric switches needed
d. No additional UPS needed
e. SVC comes with 10GbE connections (ISCSI only, no SSDs)
f. SVC 300KM replication distance, Vplex = 200KM
g. Deduplication that is part of the appliance
h. SDD (free Multipath software)
i. SVC virtual disks = 8192, 32PB total storage
j. HDS Thunder Vplex does not support
k. Lightning – Vplex does not support
m. TagmaStore – Vplex does not support
n. Bull StoreWay – Vplex does not support
o. NEC iStorage Not Support
p. Pilliar Axiom – Vplex does not support
q. Texas Memory – Vplex does not support
r. Xiotech Emprise 5000 – Vplex does not support
s. Nexan SATABeast – Vplex does not support
t. Compellent Fluid Data – Vplex does not support
u. Violin Memory – Vplex does not support
Vplex Architecture – http://www.emc.com/collateral/hardware/technical-documentation/h7113-vplex-architecture-deployment.pdf
SVC Architecture – http://www-03.ibm.com/systems/storage/software/virtualization/svc/index.html
Todd
Claudiu Schmidt says:
Post Author November 20, 2012 at 21:29Appreciate somebody took the time to answer my above points. I was hopping Barry would give us deeper technical arguments to the above points.
I still stand by the above points Todd, I will give you not only my technical view on this, I will show you official IBM Redbook statements from version 6.4.
a) Here is an extract of the official IBM document :Model 2145-CG8 Hardware Installation Guide for version 6.4
“SAN Volume Controller 2145-CG8 node features
The SAN Volume Controller 2145-CG8 node has the following features:
v A 19-inch rack-mounted enclosure
v One 4-port 8 Gbps Fibre Channel adapter
v One 2-port 10 Gbps Fibre Channel over Ethernet converged network adapter”
As you can see above it still mentions one single 4 port HBA. A single HBA is in my view a single point of failure, so my statement is correct.
b) here is an extract of: Host Attachment User’s Guide and also in the attached link:
http://pic.dhe.ibm.com/infocenter/svc/ic/index.jsp?topic=%2Fcom.ibm.storage.svc.console.doc%2Fsvc_hpmultipath_21jct2.html
“Also, HP-UX version 11.31 September 2007 release and later implements the T10 ALUA support. Implicit ALUA support is integrated to the HP-UX host type of SAN Volume Controller 4.2.1.4 and later releases.To show the asymmetric state of paths to SAN Volume Controller nodes, use the HP-UX version 11.31 command scsimgr. The asymmetric state of paths to preferred node of the LUN is shown as ACTIVE/OPTIMIZED in the output from the scsimgr command. This value of paths to nonpreferred nodes displays as ACTIVE/NON-OPTIMIZED.”
So there is a preferred/non-preferred architecture in SVC, and also an ALUA behavior. Because all IOs are now handled by the preferred node, in case this preferred node fails all IOs that are on the fly will have to timeout. In addition, because SVC itself will have to first notice this outage and in addition will have to failover the preferred ownership of volumes to the non-preferred node. This is also done with delays, time where a host will not be able to perform any IOs.
Here is the extract talking about SDD:
“For HP 9000 and HP Integrity servers, the SDD is aware of the preferred paths that are set by the SAN Volume Controller for each volume. SDD is supported on HP-UX 11.0, 11iv1 and 11iv2. During failover processing, SDD tries the first preferred path, then the next known preferred path, and so on, until it has tried all preferred paths. If SDD cannot find an available path using the preferred paths, it tries nonpreferred paths. If all paths are unavailable, the volume goes offline. SDD performs load balancing across the preferred paths where appropriate.”
Here is another doc that talks about the preferred vs non-preferred architecture of SVC, also in the latest release:
SAN Volume Controller and Storwize V7000 Replication Family
“Like a standard volume, each mirrored volume is owned by one I/O Group with a preferred node. Thus, the mirrored volume goes offline if the whole I/O Group goes offline. The preferred node performs all I/O operations, which means reads as well as writes. The preferred node can be set manually.”
c) Disconnecting/losing one appliance will definitively mean you lost availability as you now run the complete production on one appliance only. This appliance now has several SPOF. Hitting one of the SPOF will cause a complete DU of your production environment. If you lose one SVC appliance you can’t perform any firmware upgrades. An upgrade will reboot the SVC server, causing a DU for the complete production environment. Losing one appliance implies the loss of the local backend array, here is a link to the official SVC/VMware MetroCluster certification:
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2000948
SAN Volume Controller Node Failure • Loss of volume mirror
d) Operational cluster: yes, both VPLEX and SVC will continue processing. The big difference here is that with SVC you’ll run the complete production on a single appliance with multiple SPOF and in addition all IOs both read and write will have to go over the ISL link to the remote SVC appliance, adding additional latencies and bandwidth. With VPLEX even in case you lose one director or one engine in multi engine configurations you will still run all production on the same site, having on the remote site still a full HA cluster. Reads won’t pass the ISL link, they will be responded locally.
e) As mentioned at point b) all read/writes are processed by the preferred node, so all IOs going to the non-preferred node will be forwarded to the preferred node for processing.
f) covered at point b) the 6.4 docs mention preferred/non-preferred access.
g) Same as above, the IO flow is in the b) section.
You didn’t comment the backend mirror behavior:
“Data is written by the preferred node to both the local and remote storage. The SCSI write protocol results in two round-trips. This latency is hidden from the application by the written cache.”
This proves again that this is not an active/active, and not even a real ALUA design. You handle the backend IO in a true active/passive way.
Let’s now look at your arguments against VPLEX:
1. You are correct, Witness is a VMware appliance that is handling all split-brain scenarios. It has to be external to both VPLEX management servers and it has to communicate to both management servers over IP. It can be placed anywhere in the world as the latency we support is 1 seconds. You can install Witness with VMware HA or FT giving you a very high availability in any disaster scenarios.
I don’t understand your argument against the fact that we are using a Witness to handle this. A two node solution is not able to handle a split brain on it’s own, all sensible solutions have to rely on an arbitrator. Do you have the same arguments to clusters where they require a quorum?
How is SVC better here? They require a quorum disk at a 3-rd location, so you not only need an additional array there, you will require a complete SAN in that 3-rd location + FC connectivity to both SVC appliances. What would be the distance for this?
I would say installing a VM somewhere in the world to handle the split brain is a much easier and more reliable solution.
You mention that VPLEX can’t handle split-brain native: this is also not true, with VPLEX you can have fixed rules on your distributed volumes, not allowing IOs from both VPLEX clusters for the same volume in case of a split brain situation.
2. Preferred site outage causes data unavailability : this is also an incorrect statement! in case of preferred site outage the VPLEX Witness will ensure IO will continue on the non-preferred site by overriding the preference rule.
3. This is also a incorrect statement! If a VPLEX cluster fails (you need to consider that VPLEX cluster is a high available cluster with multiple redundancies, failure of a cluster is equivalent to a site disaster) all applications that are connected to this cluster will fail over. This is the scope of running an application cluster in the first case. Granted this of course assumes that the application is within some sort of OS or Application Cluster. Or do you actually insinuate that a Veritas Cluster , a HPUX cluster or any other cluster on the market won’t fail-over in case of storage loss? This is an interesting statement.
In addition to that, you have the option of connecting a host not only to the local VPLEX cluster but also to the remote VPLEX cluster in a cross-connect manner. Cross-Connecting won’t fail over any application, but in the event of a local VPLEX outage the application will still continue without outage from the remote VPLEX cluster.
4. I can’t follow this argument. The page you mention talks about the fixed rules on VPLEX where we always need to ensure data consistency thus will not allow IO over the non-preferred cluster. If you read below in the same doc you will see the witness behavior, where we don’t have to stop IO in any case.
I can’t find the argument of BE inconsistency anywhere, and this is also a false statement.
5. This is an interesting statement, wondering where this comes from, and how would SVC ensure a 0 RTO.
from Wikipedia: The recovery time objective (RTO) is the duration of time and a service level within which a business process must be restored after a disaster (or disruption) in order to avoid unacceptable consequences associated with a break in business continuity.
So this is the time you would need to restart your application. Unless you are using Oracle RAC or VMware FT, there is no way a solution can ensure 0 RTO. VPLEX can do a lot, but in this case we have to rely on the application layer.
I think you are mixing something up here: with VPLEX you can choose between preferred/fixed rules and witness. You won’t have both at the same time! So a preferred rule won’t impact Witness behavior.
So what has an RTO to do with VPLEX preferred rule? Nothing! How would SVC achieve a 0 RTO? Personally i don’t believe it can.
6. Yes, VPLEX uses write-through methodology to ensure data is consistent on both backend array clusters. Write through doesn’t mean that we are writing to disk, we are writing to the backend array cache, something that has a very low latency. The fact that you have intermittent backend problems will impact an IO, but this will impact an SVC as well. SVC will also have to free up cache by writing the information to the backend array, having problems with the backend array will also impact an IO. The write through architecture of VPLEX has also two major advantages. We are using all the cache (up to 288 GB per site) for read only. Even if we pay some latencies ( we talk about microseconds) for writes, we will gain a lot of performance by having all of the cache available for reads. As most application are read intense and not write intense (70% read, 30% write) you’ll have a improvement in response time for the application. Compared to SVC where you only have 24 GB that is shared between operating system, read, write and any other feature you might run on the cluster : deduplication, Global Mirror, Snap,Clones…
The other benefit of write through is that if you are using a 1:1 mapping you can easily move in and out of VPLEX just by reconnecting the host directly to the back end storage array. In case of a complete virtualization outage, you won’t have any data loss, everything is saved on both backend arrays at both locations and you can easily recover. With SVC there is no way to recover from a virtualization outage, the data in cache is gone, and you’ll have to recover from tape as both native clones or snaps are gone as well.
7) Because VPLEX is an active/active virtualization solution where data is accessible from each and every director , both local and remote , an indexing table won’t be able to scale and would have a huge performance impact.
It may work on ethernet switches, where you only need to update the table once a port logs in or out of the network, but having to update and replicate the table on all directors in a VPLEX cluster every time you update a block will generate a huge latency and performance impact to production. On VPLEX we look at millions of transactions and updates compared to a ethernet network with very few changes.
VPLEX cache coherency algorithm is a very mature and reliable architecture giving you the best performance and most important the reliability of the data. The VPLEX cache coherency algorithm is similar to Oracle RAC Cache Fusion, also an enterprise solution that guarantees low latency and highest availability. Both VPLEX as well as Cache Fusion won’t be able to achieve the same level of performance , availability and scalability if we would use update tables or other similar solutions.
Is SVC better? It can’t scale more then 2 nodes, you are not sharing any information between SVC IO groups. You are dedicating 2 ports out of 4 ports for cache mirroring. Is SVC doing cache coherency? Having the preferred node only handling reads and writes makes it easier to handle cache consistency, but will make it impossible to scale to multiple nodes and also to a active/active architecture.
This is not the place to be discussing specific pricing, other than to say that VPLEX is absolutely price competitive.VPLEX is , architecturally a more mature and a more reliable solution than SVC, it has more redundancy, more scalability and the unique capability of accessing data active-active between clusters.
a. Easy Tier: yes, you can have Easy Tier on SVC , but at what price? You are using same CPUs and same HBAs and this can impact production. With VPLEX you can use any array feature from any array, having no impact on the virtualized traffic
b. Management servers are only used for managing VPLEX and not used in production. Is this a technical complain or a financial complain?
c. Yes, you don’t need switches as you only can connect 2 appliances with each other. If you have a single engine solution (with already 2 directors at each site) you do not require any switches with VPLEX either.
d. UPS are also only required for switches, if you go with one engine no switches required.
e. VPLEX also has 10GbE connection option for the WAN interconnects
f. The VPLEX Metro limitation is 5ms we do not have a specific distance limitation. VPLEX has a proven solution with Ciena switches enabling distances in excess of 500km. I would really like to see a SVC stretched solution over 300km where you have double to triple bandwidth and double to triple latency impact compared to VPLEX, for each IO. I wonder what the application performance impact would look like?
g. Deduplication: SVC is allocating 4 cores out of 6 for deduplication. Meaning the moment you enable reduplication there is a massive impact on performance. You can have the feature, but at what price….with VPLEX you can use any native deduplication on the array without impacting the production traffic.
h. You can use any native MPIO software that is also free. PowerPath is not a requirement. You can run PowerPath if you require the additional features PowerPath can provide.
i. 8192 virtual volumes per system are correct. But you forget to mention that you are only limited to 2048 per IO group and that you can’t move volume between them. (IBM claim in SVC 6.4 that they can, but this is limited to AIX and Linux only with no cluster support..) You also forget to mention that the limit will include snaps and clones as well meaning if you want to have 2000 volumes with 4 snaps each you already hit the SVC limit. VPLEX can have 8000 virtual volumes per cluster not limited to the number of snaps or clones you have on the backend array. All volumes are shared between all directors in the cluster, giving you every flexibility you want. You mention 32 PB per cluster limit on SVC. That can be a simple paper value. If you can show me what customer would want to run 8 PB in a IO group, running all of those 8 PB over 2 ports of an HBA in one appliance?
j. VPLEX does support Violin Arrays. There is no technical reason VPLEX not supporting the other arrays mentioned, it’s just qualification effort and it is work in progress. You forget to mention that VPLEX is actually supporting more than 40 different arrays from different vendors.
Todd says:
Post Author November 23, 2012 at 06:271. In a cluster configuration using IBM SVC, it comprises of 2x controllers, so no single point of failure, they run in a cluster configuration
2. Preferred site failure causes full site unavailability …: This comes directly from EMC – http://www.emc.com/collateral/hardware/technical-documentation/h7113-vplex-architecture-deployment.pdf (p.79)
3. Please review the video where the applications did not respond to ping requests until after he manually re-enabled the connections
4. Go to page 72 – http://www.emc.com/collateral/hardware/technical-documentation/h7113-vplex-architecture-deployment.pdf
5. Go to page 79 – http://www.emc.com/collateral/hardware/technical-documentation/h7113-vplex-architecture-deployment.pdf (bottom paragraph)
6. In my analysis, I stated that the system goes through local cache, global cache and local storage array for processing information, yes, cache is used to improve the processing capability as it relates to I/O. Please read pg. 50-51 (top of page). The point I was making was if data is storage in your global cache and you have a problem with the network, the write will be delayed, this in turn will cause in a delay in your processing because it has not received an acknowledgement, this was the point I was making before
6a. To your second argument – This comes directly from the IBM redbook – http://www.redbooks.ibm.com/redbooks/SG246423/wwhelp/wwhimpl/js/html/wwhelp.htm
If required, host servers can be mapped to more than one I/O Group of an SVC cluster; therefore, they can access VDisks from separate I/O Groups. You can move VDisks between I/O Groups to redistribute the load between the I/O Groups. With the current release of SVC, I/Os to the VDisk that is being moved have to be quiesced for a short time for the duration of the move.
So to your point, I can geographically move an entire I/O group to load-balance the I/O group. This is due to the fact that the backend disk array does not know of the I/O group, once it has been established; however, this can be easily resolved with replicaiton. So I can move items between different I/O groups, to answer your question.
7. Easy-tier – LOL, I think I will take that one, besides, if I had high processing disks such as SSDs and low end processing disks SATA or SAS drives, I could move that data to the low processing disks, saving me time and money
8. http://www.redbooks.ibm.com/redbooks/SG246423/wwhelp/wwhimpl/js/html/wwhelp.htm – We can configure Windows machines in a Metro cluster spread across a geographic distance of 300KM, you make an argument about distance limitations, I am only pulling this information from EMC website and documentation – http://www.emc.com/about/news/press/2012/20121101-02.htm (this is what they are saying, they even talk about it a press-release, 200KM)
8a. – As far as Ciena, that can be configured with any storage appliance, but out of the box, we beat EMC Vplex by 100KM
8b. As far as price, we are cheaper by $80-100K
8c. As far as multipath statement, if you read in the items listed above, I do mention that the user can use the native I/O multipath driver; however, I stated that you would loose performance aspects, EMC even talks about that – http://www.emc.com/collateral/software/white-papers/h8180-powerpath-load-balancing-failover-wp.pdf
8d. Violin Arrays – I looked on your site and went into the partner site, there was no mention of this array along with others, you may want to have the marketing department update the site if it is not there. I even went to the powerlink website – https://elabnavigator.emc.com/vault/pdf/EMC_VPLEX.pdf
This is the official Disk Array support matrix guide – EMC Support Matrix, go to page 3, table 1 and do a search for Violin (nothing)
Also, be sure to go over the best business practices found on IBM’s website – http://www-03.ibm.com/support/techdocs/atsmastr.nsf/Web/WP-ByProduct?OpenDocument&Start=1&Count=1000&Expand=17 for correct information and how to set it up
Just to make sure I wrap up this argument, disconnecting one of the controllers in an SVC cluster does not disable I/O processing, per IBM’s redbook – http://www.redbooks.ibm.com/redbooks/SG246423/wwhelp/wwhimpl/js/html/wwhelp.htm. Yes, you are right about a preferred node does not signify ownership, it just makes it preferred or a preferred path, not the only path, that is the distinction. EMC Clariion and Symmetrix both have preferred controllers they use, the concept is the same, but it does not limit the processing if one fails, there is a millisecond process, same with other systems.
Upgrading the SVC, 10 minutes – http://www.redbooks.ibm.com/redbooks/SG246423/wwhelp/wwhimpl/js/html/wwhelp.htm (step-by-step instructions to help you with the process, I am not sure if I can do that with the VPlex Metro Cluster Solution).
So if we were to look at the overall argument:
1. Price point lower – $80-100K lower
2. SVC supports more disk arrays than the Vplex, about 10+ more
3. Distance limitations = 300KM (IBM SVC), 200KM (EMC Vplex)
4. Disk size = 32 PB (IBM SVC)
5. Easy Tier (HSM) = Only supported on the IBM SVC, not on the EMC Vplex
6. Internal Split Brain processing = IBM SVC, EMC needs to use a virtual machine, external solution that has to be setup outside of the original configuration, a band-aid solution to a problem.
7. Preferred site outage = EMC Vplex needs to be manually restarted or reviewed if there is an issue, disk I/O stops processing, IBM SVC does not
8. Free Multipath I/O driver = IBM SVC provides this to the user for free, EMC charges you for the driver, you could use another driver but EMC even says your system will degrade in performance (review 8c)
There are others, but I think this gives the user an idea of which product is better, they don’t call them Big Blue for nothing, lol.
Have a great holiday.
Good talking with you buddy.
Take care.
Todd
Todd says:
Post Author November 23, 2012 at 06:45Upgrade software video – http://www.youtube.com/watch?v=GTTbLAswNuc
Sorry about that, upgrade process takes about 20 minutes for each node.
Each node is done one at a time so there is no loss to disk I/O processing.
I wanted to make sure that was brought to your attention.
Don’t get me wrong, I like EMC VPlex, but I can buy another 2x IBM SVC Clusters for the price I would pay for EMC Vplex Metro Cluster.
Todd
Claudiu Schmidt says:
Post Author November 23, 2012 at 15:10Hi Todd,
here my replies to your post.
1. That is correct, you are running a cluster configuration, but you are running a single controller per site meaning whenever you are hitting a SPOF on this controller you will have to failover to the remote site. To discuss one of your arguments below having a 300km stretched SVC configuration with possible 5ms latency, whenever you lose the only HBA on the SVC appliance, application will have to go to the remote SVC appliance for both reads and writes (unless you go andstop application and failover to the remote node). So by only losing one HBA the application will immediately feel the 5ms of latency for both reads and writes. There is nothing comparable on VPLEX. You have full redundancy both local and remote, meaning whatever component fails, the reads will continue to be processed locally, not having to pay the penalty of a 5ms link latency. Another example would be a HBA port outage, and you’ll have to replace it. Also in this case you’ll have to go to the remote appliance having the penalty of the link. So you can see, even a port outage on a HBA will have a huge impact on production. Not to mention the fact that if you have a HBA or port outage in the above example, you will be running the complete production on a single appliance, hitting another SPOF there will cause a complete DU for your production. Again, nothing like this can happen on VPLEX. VPLEX has full redundancy on both sites, losing one or even more components won’t cause a DU.
2. Todd, you are switching between preferred and witness to bring up arguments that are not correct. The document mentioned describes both preferred behavior as well as witness behavior. The preferred rule is a fixed rule that will guaranteedata consistency in any failure scenarios. As described in the document, bylosing the preferred site, in order to guarantee data consistency, VPLEX will have to stop IO on the volumes that were preferred on that site, but continue to process IO on the other volumes. This is a limitation of a dual cluster configuration and it’s very well described in the document. You can add Cluster Witness configuration to VPLEX that will overwrite the preferred rule guaranteeing IO won’t be stopped in case of any outage. So for 100% availability you’ll have toinstall Cluster Witness, and this also described in the same document page 95.
3. Can you please point me to the video, I would be interested in seeing it.
4. This describes a double disaster, meaning you are losing the backend array and during resync you are losing the link between 2 sites. This is a valid decision to ensure data consistency at any price. Is it more important to continue IO on out of date data or is it better to suspend IO to ensure data consistency? I can go and start discussing double failures in SVC where you can end up with data loss in some cases. No product on the market is capable of covering all double disaster scenarios, especially not SVC.
5. VPLEX has 2 solutions to solve split brain: one is Cluster Witness that will guarantee 100% availability and no DU, and the second option is preferred rules that guarantees data consistency in all cases and no DU in all cases except preferred node outage. From the 2 options you are only pushing and arguing against the preferred rule, and forget to mention that VPLEX has another alternative to guarantee no DU. At least VPLEX has an alternative for customers who don’t want to deploy a 3-rd site, even if in VPLEX case we talk about a free of charge solution with minimum requirements on server and IP connectivity somewhere in the world. What alternative does SVC offer for customers who can’t afford a 3-rd site with FC connectivity? Why don’t you argue about this limitation instead?
6. If you have a problem with the link between SVC nodes, isn’t the write delayed as well in SVC? This is a sync solution that requires an ACK from both sites, a problem on the link will always impact awrite IO whatever sync solution you deploy.
6a.
Ok, so let’s look at your argument: with SVC you can attach a host to 2 IO groups for different VDisks. What kind of a unique feature is this? You can have a host presented a volume from one array and another volume from a different array. This is what this message is all about. You can’t present the same volume over 2 IO groups, and this is the big difference to VPLEX where you can present one volume over all directors at the same time.
Looking at this more in depth, I have actually found another strong argument against SVC, thanks Todd!
Assuming you want to use different IO groups for an application, let’s look at 2 Oracle applications: you have Oracle1 having database files on IO group1 and Log Files on IO group2, Oracle2 has database on IO group2 and Log Files on IO group1. Now let’s go to the famous 300km and maybe 5ms latency distance and stretch both IO groups. Assuming appliance1 in IO group1 has hit the famous HBA or any SPOF , this will force SVC to failover and have all IOs, both read and write go to the remote site. In case of Oracle1, the database files will have a 5ms latency impact for both read and write forcing application to have a performance penalty, and Oracle2 will also have a performance penalty, as the database writer will have to wait for the Log Writer, that is traveling thewhole 5ms to the remote SVC appliance slowing down the application. So by doing this you will not only impact one of the applications, you will impact all applications that are now spread across 2 IO groups…..
Have you looked at the limitations of volume move in SVC with 6.4? you can’t move a cluster disk, you can’t move ESX volumes, you can only move AIX non clustered with short freeze, you can move Windows with reboot (how non-disruptive is this?) and Linux non clustered with freeze as well……so how good is this solution?
You mention you geographically move an entire IO group to load balance? I don’t think I get this one. If SVC can’t even move locally between IO groups, how can you geographically move it? If you mention replication to move IO groups, this is not an online migration, this is offline using a replication product….
8. One thing is what you can do from a marketing point of view with a product, and a different argument is what is technically feasible to do. EMC has a solution that can go over 500km, but would a customer want to do that…I don’t think so. You claim you can go 300km, but would a customer actually do it? In addition, with VPLEX you will have only a single round trip, but with SVC you can have 2 or even 3 round trips, making a 300km solution a 900km solution….do you want to run your production on this?
Todd , here is the press release: “EMC VPLEX doubles the distance that customers can non-disruptively move VMware vSphere vMotion workloads between active-active data centers—up-to 200 kilometers (km).”
As you can see, it doesn’t say VPLEX can only do 200km, it says VPLEX enables customers to do vMotion on up to 200km. The 200km is not a VPLEX limitation it’s a VMware limitation. VPLEX is supporting whatever distance you can reach within the 5ms latency.
8a. Think we talked enough about theoretical distance, and what each product can do with what penalties.
8b. Pricewise I did mention that VPLEX has a very competitive price. You should probably go and ask for a new VPLEX offer andcompare. It is also very important to compare equal to equal, as mentioned before having 8000 volumes with 4 snaps each you will require a minimum of 4 SVC clusters, something you can easily do with one single VPLEX cluster.
8c. This is a blog comparing VPLEX with SVC, and not comparing PowerPath to native MPIO or IBM sdd.
8d. Violin Arrays – VPLEX has support for Violin Arrays, there are customers running them in production.
The number of arrays it supports shouldn’t measure the quality of a product. EMC is qualifying each and every array that is placed on the support matrix to ensure it works properly with VPLEX. As soon as customers have arrays that are not on the Support Matrix, they will be qualified. The fact that SVC is longer on the market and has higher number of arrays supported doesn’t automatically make it a better product. Several years ago when VMware came on the market it had a limited support for arrays and applications compared to Solaris. Where is Solaris now and where is VMware now….?
Disconnecting a controller is documented in the SVC – VMware MetroCluster KB article. The fact that VMware experienced this and IBM approved and confirmed this behavior gives me a high confidence that this is the expected behavior.
“EMC Clariion and Symmetrix both have preferred controllers they use, the concept is the same, but it does not limit the processing if one fails, there is a millisecond process, same with other systems.”
You are very wrong here, Symmetrix has no notion of preferred controller, all controller are active and equal in processing IO to a volumes.
In my previous post I gave you enough links from documents where IBM is addressing the preferred/non-preferred behavior of SVC, and they do mention several times that only the preferred node is processing both reads and writes.
“Upgrading the SVC, 10 minutes “-
the video post is talking about 20 minutes per node + another 30-40 minutes for the cluster. This is more than 10 minutes upgrade. The big difference to VPLEX , and I mentioned this several times , is that you are not losing a complete site. With VPLEX IOs will still be processed on both sites, even during a NDU. With SVC you will have to failover the complete production and then failback the complete production. This, plus the fact that all IOs are now going over the link to the remote appliance, adding additional latency to each IO making the upgrade a process that you don’t want to run during production hours, a non non-disruptive process …
So if we were to look at the overall argument:
1. “Price point lower – $80-100K lower”
VPLEX is competitive in price; you should go and get a new offer from both EMC and IBM. What use is a cheaper solution when I am impacting production? Thewhole point of my post is to show you the more advanced architecture and the unique architecture of VPLEX when compared to SVC.
2. “SVC supports more disk arrays than the Vplex, about 10+ more.”
As mentioned above, the number of arrays supported doesn’t reflect the quality of a product. VPLEX supported more than 40 different arrays, something that is enough for most customers. VPLEX can also virtualize SVC , to make SVC a more high available and active-active virtualization J
3. “Distance limitations = 300KM (IBM SVC), 200KM (EMC Vplex)”
As discussed above, VPLEX doesn’t support distance , it supports latency.
4.” Disk size = 32 PB (IBM SVC)”
As mentioned in my previous post, what customer is willing to put 8PB behind one appliance with 1HBA…?
5. “Easy Tier (HSM) = Only supported on the IBM SVC, not on the EMC Vplex”
VPLEX does support all different tiering products from backend arrays. The benefit of having it native on the array is that it won’t impact any resources on the virtualization layer. I couldn’t find the answer on this one, maybe you can point me to the appropriate doc: what happens with Easy Tier when you lose on appliance or during NDU? The only surviving node will have to disable all write cache and will have to write through to the backend every IO. How performant is Easy Tier in this situation?
6.” Internal Split Brain processing = IBM SVC, EMC needs to use a virtual machine, external solution that has to be setup outside of the original configuration, a band-aid solution to a problem.”
How is SVC internal split brain processing? You require an external quorum disk that is connected over FC to both appliances. It is more external that a VM running anywhere in the world at zero cost!
1. “Preferred site outage = EMC Vplex needs to be manually restarted or reviewed if there is an issue, disk I/O stops processing, IBM SVC does not”
I think I had an entry in each post for this one. Don’t know why you are not reading the document or my entries correctly.
VPLEX has the Witness to ensure 100% continuation. One additional point to this entry: you mention SVC doesn’t require any manual intervention, and this is wrong: If you have a split SVCcluster running VMware HA between the 2 sites and a virtual machine is running on the non-preferred site, what will happen in case you lose the links between the 2 clusters? SVC has no APD handling, meaning the VM will hang on the non-preferred site until you go and manually reboot the ESX server, having the VM restart on the preferred site. VPLEX is handling APD conditions, meaningyou’ll have automatically restart of the VM without manual intervention.
2. “Free Multipath I/O driver = IBM SVC provides this to the user for free, EMC charges you for the driver, you could use another driver but EMC even says your system will degrade in performance (review 8c) “
As mentioned above, you can run native MPIO, free of charge. If you need additional functionality you can buy PowerPath.
“There are others, but I think this gives the user an idea of which product is better, they don’t call them Big Blue for nothing, lol.”
Where I come from it’s called Big Old Blue to better reflect technology LOL
Claudiu
Claudiu Schmidt says:
Post Author November 27, 2012 at 18:30Todd,
based on your comments on how more mature SVC quorum is compared to VPLEX Witness , i started looking more in details on the SVC quorum behavior:
Have you noticed that a split SVC configuration will actually stop IO at one site in case all links between sites go down? And the more interesting thing is that you can’t even predict or guarantee which site will continue running and which site will stop. So in a split SVC configuration, running production at both sites will in fact stop production at a complete site causing a complete DU for you production……with VPLEX, even in the default configuration without any VPLEX Witness we will continue processing IO at both sites with no DU for production. in case we have Witness, we will not only continue IO at both sites in case of a link outage, we will guarantee production will continue IO on the surviving site in case of a site outage.
here is an extract of from the split SVC Erata:
ftp://ftp.software.ibm.com/storage/san/sanvc/V6.3.0/SVC_Split_IO_Group_requirements_Errata_V1.pdf
In a split I/O group configuration, the active quorum disk must be located at a third site (the quorum site) as an independent failure domain. If communication is lost between the production sites, the site with access to the active quorum disk continues to process transactions. If communication is lost only to the active quorum disk (without impact on node-to-node communication), an alternative quorum disk at another site can become the active quorum disk.
Todd says:
Post Author November 29, 2012 at 04:23Sorry for the delay.
This is the video I was referring to – http://www.youtube.com/watch?v=WJQfy7-udOY
I think I have referred to most of the items in the previous blog.
Ok, we are going to address each item one at a time, for some reason you are not getting it, let me be clear, this information is coming from your documentation, not mine, but EMC’s.
1. Ok, clearly you agree with me there.
2. Earlier you stated I got the information from Wikipedia, obviously, now you see what I was referring to, from your documentation, however, you want to spin, it, you loose a level of I/O processing capability, in your statement, we agree there, leave well enough alone
3. http://www.youtube.com/watch?v=WJQfy7-udOY
4. Again, this came from your documentation and by the way, in your documentation, where does it say a double disaster, I did not see that. I am not saying that it cannot occur, but it does not say that in the IBM Redbooks section
5. Witness, I did not write this, it is a solution that is a band-aid
preferred – If we go back and review EMC’s documentation they talk about I/O becomes unavailable, period.
6. Quorum disk is a partition that is part of the SVC solution, it is a cluster, much like that of most cluster’s, that uses heartbeats to determine if the site is down, again, since the price is so much lower, we can afford to have a third site if needed, but in this case, there is no way to loose the cluster because it has the ability to dynamically create a quorum disk between the clusters and it automatically votes which one is part of the overall group, intelligence built into the SVC, enough said.
7. Preferred site outage – The EMC vplex will have to address this manually as stated before, SVC addresses problem nicely (read above). If the link goes down, the site will continue to process data, no manual intervention, no third party virtual machine nor any preferred cluster, it does it for you.
8. 10-20 minute installation – obviously you did not read the section right below when I wrote about the 20 minute installation time frame, lol, but no biggie
- As far as other disk arrays (Violin included), nothing in the documentation stated that this disk array was supported, ok, if that is the case, then EMC’s documentation needs to be updated, no problem there
- KB Article – can you locate that for me
- Clariion/Symmetrix – It is interesting you only state that I was wrong when it came to Symmetrix, what about the Clariion, does that not use a preferred controller configuration? I thought so.
9. One thing I forgot to mention – IBM System Storage Productivity Center is supplied by default when you purchase your SVC cluster, another bonus for obtaining the product (http://www-03.ibm.com/systems/storage/software/sspc/), it just keeps getting worse for EMC, lol
Interesting note from IBM, please read carefully:
It is key for all active nodes of a cluster to know that they are members of the cluster. Especially in situations, such as the split brain scenario where single nodes lose contact to other nodes and cannot determine if the other nodes can be reached anymore, it is key to have a solid mechanism to decide which nodes form the active cluster. A worst case scenario is a cluster that splits into two separate clusters.
Within an SVC cluster, the voting set and an optional quorum disk are responsible for the integrity of the cluster. If nodes are added to a cluster, they get added to the voting set; if nodes are removed, they will also quickly be removed from the voting set. Over time, the voting set, and hence the nodes in the cluster, can completely change so that the cluster has migrated onto a completely separate set of nodes from the set on which it started.
Within an SVC cluster, the quorum is defined in one of these ways:
More than half the nodes in the voting set
Exactly half of the nodes in the voting set and the quorum disk from the voting set
When there is no quorum disk in the voting set, exactly half of the nodes in the voting set, if that half includes the node that appears first in the voting set (a node is entered into the voting set in the first available free slot)
These rules guarantee that there is only ever at most one group of nodes able to operate as the cluster, so the cluster never splits into two. The SVC cluster implements a dynamic quorum. Following a loss of nodes, if the cluster can continue operation, the cluster will adjust the quorum requirement, so that further node failure can be tolerated.
I have included all of the information as it relates to split brain (be sure to read the quote I list above). I am not taking sides but it seems that EMC has tried to create a band-aid solution to the Split brain problem using a virtual machine as opposed to a more prominent solution like the one mentioned when we refer to dynamic quorum.
In addition, if there is a third site, I can purchase another SVC solution that is lower in price than your VPlex metro solution, lol. I can actually have 4x sites up and running and still be less than your Vplex Metro solution.
As far as in case of a site outage – please look at this video – http://www.youtube.com/watch?v=WJQfy7-udOY. Clearly in the example, the VM acted funny when the connections went down, in the case of the SVC, that does not happen.
MPIO – I am going to leave this alone, as stated before, this was an advantage of going with the SVC solution because the user does not have to buy additional software, they can enhance product, but I think what you are missing and what EMC is saying is that you loose performance when you go with an native MPIO solution as opposed to using EMC Powerpath, but you expressed that in the conversation.
So again, let’s elaborate again, just one more time so he gets it:
1. IBM SVC has a much lower price point than the EMC VPlex
2. Preferred site and Vplex Witness to address split brain, IBM has an internal dynamic Quorum solution, nothing external to configure
3. Processing stopped in video, proof is in the pudding
4. Nothing in the documentation says double disaster, from EMC that is
5. Easy Tier or HSM – Ok, won that one out right
6. Split Brain – IBM SVC (3 methods where they mention that the cluster will never split in two, EMC has band-aids to combat that problem, where IBM uses an intelligent solution (dynamic quorum and voting), I voted for Obama, that is why we are winners, again, much like IBM
7. Free MPIO software – IBM SDD, Free Hardware – Comes with the purchase of the SVC – IBM System Storage Productivity Center (an entire system that can be used to manage your storage from a centralized location), it is part of the SVC purchase (entire server for your use), lol.
I think I have beaten this horse to death.
You can’t win.
T
TheStorageChap says:
Post Author November 29, 2012 at 14:06Todd, Claudiu,
Great informative posts on both accounts and I thank you for taking the time to both put your points across. Todd you are totally correct that much of our public information needs updating and I think that might be why we seem to be going around in circles.
So to hopefully wrap this up.
1. If you break a two node SVC IO Group between two sites, you have only one node in that site and therefore a single point of failure with it own single points of failure; with VPLEX you will always have as a minimum two Directors within a site and can have more as required all working against the same ‘I/O Group’.
2. If you have the VPLEX Cluster Witness, which is free and also EMC best practice, you maintain 100% storage availability.
3. The infamous Video was from 2010 with VMware 4.x and VPLEX 4.2 before it had a Cluster Witness facility and also when everyone suffered with APD scenarios. This was the same for everyone and was largely resolved in VMware 5.x. and with the VPLEX Witness in later versions of VPLEX.
A much better and up to date example can be found at http://www.youtube.com/watch?v=Pk-1wp91i2Y, there is also a document at http://www.emc.com/collateral/software/white-papers/h11065-vplex-with-vmware-ft-ha.pdf
4. I still don’t think there is an argument here other than the fact you would want to use the Witness, which for some reason you don’t like. ☺
5 & 6. You can argue for and against both EMC Witness and IBM Quorum models. I certainly wouldn’t call the EMC VPLEX Witness a ‘band-aid’, more a proven industry approach. No one here is arguing against the SVC Quorum model in a local site configuration, then it makes sense. It does not make so much sense when you have to split the I/O group between two physical sites and then have a third site with a FC Disk for quorum.
“In this configuration, special attention must be given to the quorum disks to ensure a successful clustered system failover. Generally, when the nodes in a system have been split across sites, the SVC system must be configured in the following manner:
Site 1 contains half of the SVC system nodes plus one quorum disk candidate.
Site 2 contains half of the SVC system nodes plus one quorum disk candidate.
Site 3 contains an active quorum disk.
This configuration ensures that a quorum disk is always available, even after a single-site failure.” P39 – http://www.redbooks.ibm.com/abstracts/sg247933.html
In the “interesting note from IBM, please read carefully” section of your last post you quoted from the Redbook for version 5.1 of SVC, P35 http://www.redbooks.ibm.com/redbooks/pdfs/sg246423.pdf which indeed stated the three ways. However in the latest 6.3 version, this is completely removed and rephrased with more emphasis on the fact that the voting set AND a quorum disk are responsible for the integrity of the system rather than your implied voting set OR quorum.
7. If you have the VPLEX Cluster Witness, which is free and also best practice, you maintain 100% storage availability.
8. We have a support matrix and you are correct that Violin and some other arrays are not on that published support matrix. So whilst we support them I cannot argue with you that we have not stated it publically. ☺
In regard to the distance conversation the press release can be found here at http://www.emc.com/about/news/press/2012/20121101-02.htm. Again you are correct that we stated 200KM. This is a case of the tail (Marketing) wagging the dog (Engineering). Technically we support a 5ms RTT, 10ms RTT with VMware, and we do not care what the physical distance is as long as the RTT is not exceed. If a customer wants to use link optimisation techniques to extend the physical distance this is absolutely supported. The reality is of course that it is mostly immaterial, as the supported distance will depend on application latency restrictions.
In regard to the CLARiiON/VNX everyone knows it is an active/passive controller architecture vs. Symmetrix which is active/active. But this has nothing to do really do with the VPLEX conversation. Nobody is suggesting that in a standard local SVC Cluster there is any difference to an active/passive array architecture. The point was that if you took such an architecture and physically divided the two active /passive components, across two sites, it would have limitations. AKA Split-Node SVC Cluster.
9. Re: IBM System Storage Productivity Center – Great, another Framework for a whole bunch of Element Managers. I guess there is some value for some customers.
I think we have come full circle here back to the point of the original post. IBM SVC was originally designed for local site storage virtualisation and provides a very adequate solution to this requirement in regard to architecture, quorum etc. A Split-Node SVC cluster is, to coin your term, a ‘band-aid’ approach to active/active something that it was not originally defined for and therefore has certain less desirable nuances when compared with a solution that was built with dual site active/active in mind.
Barry Whyte says:
Post Author November 29, 2012 at 18:40Sorry, been kinda busy. Since EMC are touting this round as FACT at customer briefings we do need to straigten things out.
1. Both references in the redbooks are out of date / need correcting. I have asked for this to happen.
2. Preferred node / non-preferred. Both nodes will service read and write requests AND DO NOT, need to forward I/O processing to the partner. If a read comes in on the non-preferred, it will read through the local node to disk. If a write occurs to the non-preferred node, it will be cache mirrored to the partner anyway. therefore it is active active.
.3. Volumes vs mdisks. You need to remember that SVC abstracts these objects. A volume is what the host sees, there is only ever one instance of a volume. Even if this is a mirrored volume, there is just one volume. Its active/active on both nodes. This volume then has two copies further down the stack that equates to two storage pools – in a split-cluster config, this means one storage pool at each site (coming from a controller at each site) – therefore – the mdisks are fully active/active if the controller supports active/active mode.
4. The split-cluster config, in the failures mentioned, suggesting SPOF can be configured to pull in spare nodes at each site should either side be lost, returning it to fully redundant.
5. An SVC node is designed as a failure domain. There is no SPOF in an SVC cluster, a node can fail, is allowed to, and references to CPU, memory, HBA etc is not the way its designed. Its like a dual controller solution, you wouldn’t say that has an SPOF.
Claudiu Schmidt says:
Post Author November 30, 2012 at 00:12Hi Barry,
appreciate you took the time to give us some more insight into SVC and also raise the level of this conversation to be more technical !
I have some additional questions, hope you find the time to answer them:
1) regarding the active/active IO access, can you please tell me in what version this was introduced? do you have any documents or link that better describe the behavior?
2) you mention the fact that reads are done from the local node to the backend array regardless of preferred or non-preferred node. You also mention the fact that writes will be cache mirrored to the partner node anyway and that’s the reason it’s active/active. The question that i still have is who will perform the backend mirroring task? the receiving non-preferred node or the preferred node?
3) Spare node configuration: is this something SVC will automatically perform in case of a node outage? or is this a customer service engagement to replace the failed node with a new one? i haven’t seen any document to describe a spare node configuration , would appreciate if you can describe or post the appropriate document.
4) Quorum behavior: how will split SVC handle the loss of all links between sites? will it allow IO only on the first node that gets hold of the active quorum (as documented in several IBM redbooks)
Thanks,
Claudiu
Name (required) says:
Post Author January 24, 2013 at 09:05Hi Barry,
unfortunately i haven’t heard back from you on the above. Hearing more claims that SVC is a active/active architecture i would like to add some more details on the above to show the active/passive architecture:
- Whenever a write goes to the non-preferred node it will have a penalty when compared to a preferred-node write. here is an extract from a IBM rebook:
“Therefore, when a new write or read comes in on the non-owner node, it must send extra messages to the owner node. The messages prompt the owner node to check whether it has the data in cache or if it is in the middle of destaging that data. Therefore, performance is enhanced by accessing the volume through the preferred node.”
- Write Destage: cache content is only destaged from the preferred SVC node to both mirror legs, the non-preferred node will perform any writes to any of the backend mirrors. This shows exactly the active/passive architecture, only one node will perform backend write operations, affecting performance and bandwidth utilization. An active/active architecture will perform write operations from both nodes to local mirror legs, having no additional performance nor bandwidth overhead.
- Read operations: a read can be performed on both preferred and non-preferred node. A read to the non-preferred node will have a performance impact as described above due to the additional messaging between non-preferred node and preferred node.
- Read from backend mirror legs will only be performed from the primary mirror leg. The secondary leg will not be used in any read operation until the primary leg goes down. This again shows the active/passive nature of the architecture, where only one mirror leg is actively used. This is especially critical if going to a stretched SVC configuration, as you’ll always have to co-locate every host with the primary mirror, so flexibility of moving applications between sites will result in huge performance degradation and high bandwidth utilization, as now each read will have to go over the ISL link to the backend array.
A active/active solution will perform all reads locally, independent on what site the IO request comes in. both mirror legs are equal and no performance penalty nor additional bandwidth will be consumed when accessing from both sites.
- after cache destage to backend , the non-preferred node will remove all content from cache, only the preferred node will keep this content as read cache. This shows again the active/passive nature of SVC, as it was not designed to receive any reads nor writes over the non-preferred node.
- you mention the fact that a MDisk is active/active, but why bring this argument? a MDisk is nothing but a set of backend volumes that are presented to SVC. you will have to carve out a volume from an MDisk in order for a host to access it, and this VDisk is relevant for a active/active architecture. the MDisk is not presented to a host, nor has any usage of being active/active. This is similar to a VNX array, where i can have a set of disks creating a Raid Group, and this Raid Group can be access by both Storage Processors equally and active/active. This doesn’t make the VNX an active/active array, as the volume i have to carve out of this Raid Group needs to be active/active, and not the RaidGroup…
- The standby SVC node is also nothing that you really can mention to highlight the availability of SVC. This will still be a manual replacement procedure that needs to be performed by a IBM engineer prior to regain redundancy. During this time customers will run on a single SVC node with no redundancy at all, having additional penalties from both disabling cache on the remaining node as well as performing all read and write operations from the remote node. Losing this node or the appliance HBA will lead to production DU. A high available solution will allow any component failure without big performance impacts nor the need to perform all read and write operations from the remote cluster.
- you mention that SVC is like a dual controller solution, but you forget to mention the fact that every dual controller architecture is not designed to be stretched between 2 sites, leaving only one controller per site. running any backend array solution with multiple redundancies behind a stretched SVC solution will in fact decrease this arrays availability as it now runs behind an appliance with no redundancy per site..
Thanks,
Claudiu
Øyvind Skjold says:
Post Author February 16, 2013 at 21:51Hi,
How would you compare Datacore SSV to EMC Vplex and IBM SVC?
Datacore SSV is a truly active/active solution, isn’t?
Regards
Øyvind