The changing face of block storage virtualization

The changing face of block storage virtualization

Posted on January 22, 2012 by TheStorageChap in Data Center, Federation, Virtualization

Storage virtualization is no longer itself a solution, storage virtualization is a function, an enabler if you wish, for new capabilities within the data center.

In the same way as server virtualization breaks the bond between the operating system and physical compute resources, storage virtualization is about abstraction, breaking the link between hosts and physical storage arrays. The virtualization system presents to the host a logical space for data storage and handles the process of mapping it to the actual physical hardware.

Three maybe fours years ago storage virtualization had two primary use cases, non-disruptive data mobility and heterogeneous cloning. Today things are changing and the solutions enabled by storage virtualization are dividing in to two main areas,  storage commoditization and multi site storage federation.

Storage Commoditization

Spinning disk is spinning disk and many of the primary storage vendors share the same physical disk manufacturers. What enables one vendor to command a bigger premium for a storage array than another is the additional intelligence and functionality that is possible within that storage array. For example, replication, snapshot, cloning, automated tiering and virtual provisioning functionality. This is big business for all of those involved.

Storage commoditization is the movement of this intelligence from the storage array and into the abstraction/virtualization layer, by doing so leveling the storage array playing field into what could end up being just a bunch of disks. The benefit for the end-user is that the next time they refresh their storage array they do not need to be concerned about, for example, bringing online a new array based replication solution and more importantly they can buy any storage that meets a specific functionality and cost requirement irrespective of the specific features or vendor of that storage.

To date there are two main approaches to this idea of storage commoditization, by virtualizing using the controllers within a storage array or virtualizing using an external appliance. Interestingly which approach a vendor has taken, you could argue, has been based on just how capable their storage array technology has previously been.

HDS and NetApp use their array controllers to enable heterogeneous storage virtualization and commoditization, effectively you can use the intelligence and functionality of the array controllers across other storage arrays. Of this approach and these two vendors my experience is that the NetApp vSeries probably comes the closest to being able to claim a storage commoditization use case, whilst the HDS functionality is primarily used to help with migrations.

IBM’s Storage Virtualization Controller (SVC) is perhaps the most established appliance based approach to storage virtualization and storage commoditization. IBM has been steadily building storage intelligence and functionality into SVC over the past several years, so much so that their latest array offerings actually use SVC as the storage ‘controller’.

Multi-Site Storage Federation

Multi-Site Storage Federation is the process of using storage virtualization to enable the provisioning of a virtual LUN, with exactly the same storage and LUN identifiers across two data centers.

The purpose of storage federation is to enable the distributed active-active data center and in turn greater levels of application availability by enabling either continuous application availability or rapid application restart in the event of the loss of data center infrastructure or in fact a data center.

As you would expect me to say, I believe EMC VPLEX is probably the best example of multi-site storage federation available on the market today.

Cross-over?

Yes, I can hear you shouting, there is some cross-over between storage commoditization and multi-site federation for some of the products I have mentioned.

NetApp can use a product called MetroCluster to construct a solution that enables the sharing of LUNs across sites enabling a certain level of additional availability. IBM can construct a split node SVC solution across two sites again enabling the sharing of LUNs across sites (see this post for further discussion on why VPLEX Metro differs from MetroCluster and SVC split node). EMC VPLEX can enable certain local commoditization use cases.

The fact is that none of the products on the market today are able to enable both use cases well, they are good in only one; either storage commoditization or multi-site federation. My question to myself and now also to anyone reading this, is should there be two different faces or does the market actually want both functionalities in a single solution?

To a degree the answer lies in whether there is a long term future for intelligent storage arrays. I work for a company who’s success has been built on our ability to develop market leading intelligent storage arrays and our customers wish to purchase them. But equally I have seen subtle changes in some enterprise organization’s thinking in regard to how their data storage layer should be constructed; for some that has meant investigating and also purchasing storage commoditization solutions.

It also depends upon the scale of the requirement. The modern storage array and more specifically it’s controllers are themselves a virtualizing abstraction layer enabling amazing capabilities, like fully automated sub LUN tiering, that were unheard of a couple of years ago. There are many many customers who do not require more than one storage array in their environment but would want to be able to federate that array with another in a second site to enable a distributed data centre. Likewise there are those customers who have already invested in arrays capable of heterogeneous storage virtualization who want to be able to create a distributed data center.

In all cases what we are talking about here is how we enable cloud storage, something that many agree is part of the the next wave of IT.

So what should the ideal solution look like?

If I was developing a new virtualized storage layer and functionality, what would I want in it? Well from my point of view I would want it to include the following, but I would welcome input…

General Platform

  • A building block approach allowing both small and large customers to choose the features and functionality they want at a cost that is appropriate for their level of expenditure. e.g. they may just want federation for existing intelligent arrays or they may want to virtualize a JBOD.
  • A scalable cluster of engines/controllers participating in a single I/O group with all of the things that go a long with such a model, such as distributed cache coherency.
  • A virtual edition, that is platform independent, enabling the functionality as a virtualized appliance or enabling it to be embedded within a storage array (if appropriate).
  • Ability to non-disruptively scale across clusters with a single point of administration and performance monitoring.
  • Heterogeneous multi-vendor storage support with an option to also add one or more native storage blocks to build an ‘array’.
  • Front-end protocol support for FC, FCoE, iSCSI, CIFS and NFS.
  • Advanced (read things like Fast cache and server side caching), selectable on or off, read and write cache on a LUN by LUN basis.
  • Deep integration with technologies such as VMware and Hyper-V. Think VASA, VAAI, VSI etc.

Storage Commoditization

  • Encapsulation and native volume management.
  • Synchronous and Asynchronous replication.
  • Advanced snap and/or clone functionality.
  • Automated tiering across back-end vendors and storage types.
  • Thin/Virtual Provisioning.
  • De-duplication.
  • Non disruptive data mobility.

Federation

  • Presentation of virtual LUNs with the same storage and LUN identifiers across multiple data centres, with concurrent read/write access and protocol support for FC, FCoE, iSCSI, CIFS and NFS.
  • Active utilization of assets at each site for the same distributed LUN.
  • Support for federation across distance.
  • A resilient engine architecture at each site.
  • If replicating from federated data centers to a non federated (DR) data center the ability for replication to continue in the event of a loss of any of the federated data centers. Think distributed replication technologies.
  • Simple, low cost, witness functionality to eliminate split brain in a two site model.

Summary

Is there an opportunity for a startup? Possibly, but I think history has proven that this type of technology has to come from an established vendor, in order to overcome inevitable customer concerns with stability and supportability. If you are reading this, thinking it is a precursor to an EMC announcement, I am sorry it is not, at least not yet… Some vendors are driving more towards this model than others and some have mixed strategies, time will tell what approach (array based or appliance based) resonates the most with customers and which vendor can ultimately come the closet to this wish list of features and functionality.