Zero data loss with Asynchronous replication?

Zero data loss with Asynchronous replication?

Posted on January 19, 2012 by TheStorageChap in RecoverPoint, Replication

So I met with my friends at @Axxana yesterday to continue our ongoing debate around their value proposition. Axxana manufacture a solution that offers aircraft ‘black box’ like functionality for the datacenter. Specifically it offers organizations the ability to offer zero data loss using only an asynchronous replication solution.

For those not in the know, there are two primary methods of datacenter replication, synchronous in which every write is written to the remote site before the host is allowed to continue, guaranteeing that the data is off site, and asynchronous where writes are acknowledged and stored locally and then at some point written to the remote site.

Whilst synchronous offers the best level of protection, in can only be performed over relatively short distances and requires in many cases considerable bandwidth, normally at an additional cost. On the other hand asynchronous replication can be performed over long distances, with considerably smaller amounts of bandwidth but with the possibility of data loss if a disaster occurs before the data has been sent to the remote site.

The difference between asynchronous protection and synchronous protection is therefore simply the data that is at the local site waiting to be sent to the remote site by the underlying replication solution. Axxana’s innovative approach to solving the issue of cost vs. protection level has been to create a fire, flood, earthquake and explosion proof, flash storage based solution that can hold the blocks of data waiting to be replicated to the remote site. This process is continuous, as data is replicated it is removed from the Axxana cache and new incoming data waiting to be replicated takes it’s place.

In the event of a disaster, the customer uses their replication technology (more on that in a moment) to role back to the latest asynchronous point in time that they have, then they can recall via a variety of methods (WAN, direct connect, GSM) the latest writes that had occurred in the primary datacenter but that had not been replicated by the replication solution and apply these on top of the async data copy to recover to the last writes that occurred in the primary datacenter before the disaster hit.

Why do I care about this, well Axxana is reliant on having an underlying solution doing the replication and today @RecoverPoint is the only certified solution. RecoverPoint offers synchronous (distance and bandwidth limitations apply) and asynchronous replication giving customers flexibility, but by itself it does not provide a solution if:-

  1. The customer needs synchronous levels of protection for all, or a subset of, applications, but cannot justify the bandwidth cost
  2. The customer needs synchronous levels of protection but because of their locality will never be able to get the required bandwidth
  3. The customer needs synchronous levels of protection, but the sites are not within synchronous distances
  4. The customer needs to offer a zero data loss solution, due to for example regulatory requirements, and has synchronous replication but needs to continue offering zero data loss in the event of a link failure and rolling disaster

These are the reasons why EMC has partnered with Axxana to offer our customers an innovative solution that resolves those issues.

The first three points are self explanatory, but the fourth is often not considered. There are two key points for customers in this situation.

Firstly these customers often have three sites, the primary site, a bunker site which holds the synchronous copy of the data and a third site many hundreds of miles/KMs away which holds an asynchronous copy of the data. The reason for the third async copy is because the primary site and bunker site have to be close together to meet the requirements for synchronous replication but are not far enough apart to negate the risk of both sites being made inoperable by a disaster. In this scenario a RecoverPoint and Axxana solution can potentially remove the need for the bunker site enabling potentially dramatic cost savings (e.g. site, power, storage, bandwidth and software costs), allowing the organization to operate only two sites, async distances apart, whilst still providing an RPO=0.

Secondly, where the organization only has two sites with synchronous replication they are at risk in the event of a rolling disaster. Take the example that an earthquake occurs and the event first impacts the telco provider offering bandwidth between the sites, halting replication traffic. Next the power is affected and after the UPS is exhausted (assuming there is one) the datacenter shuts down in a not very graceful manner or worst still the datacenter is destroyed or inoperable. In these (albeit extreme) cases the last writes are not at the remote site and the regulatory requirement has not been met. Had there been an Axxana solution at the primary site it would potentially have been possible to retrieve those lost writes and recover to the latest point in time.

The technology works and has been proven in several customer environments, but the big question that remains is what is this capability worth to an organization and therefore how much are they willing to pay? The answer of course is that it depends on individual circumstances such as scale of requirement, datacenter localities and bandwidth restrictions. If an enterprise organization has to offer an RPO=0 to meet a regulatory requirement and circumstances dictate that it is not viable to do this with a synchronous replication solution, due to example link availability and/or cost, then the cost of the Axxana solution may be attractive. Likewise for those with a three site bunker model. If a smaller commercial organization who has asynchronous replication today now needs RPO=0 for a new application that is being bought online, the cost may be prohibitive, although I would still encourage you to work with EMC and Axxana to explore the options in that case.

At a high level what would I want to see in terms of future EMC + Axxana enhancements?

  1. Integration with all EMC storage based RecoverPoint splitters and RecoverPoint versions. Today, we support only CLARiiON and VNX with RP/SE and IMHO this is not necessarily where the biggest match is between Customer requirement and EMC + Axxana solution.
  2. A smaller commercial offering that offers similar functionality at a reduced cost that I can put in the rack next to my RecoverPoint Appliances.

There is no doubt in my mind that the general concept is a great one and with a little refinement could ultimately become a must have addition to any async replication solution. After all why wouldn’t you want zero data loss?!