Oracle RAC 101 and what VPLEX Metro means for Oracle Extended RAC

Oracle RAC 101 and what VPLEX Metro means for Oracle Extended RAC

Posted on June 28, 2012 by TheStorageChap in Oracle, VPLEX

Lots of questions on Oracle RAC and VPLEX over the last few days, so I thought it might be worth having a quick review.

In a normal database environment a single instance of Oracle, runs on a single server and accesses a single database. If the single instance stops, there is no access to the database, if the storage goes offline there is no access to the database, if the site goes offline there is no access to the database.

Oracle Real Application Cluster (RAC) is an optionally licensed  module from Oracle that enables multiple Oracle instances to simultaneously access the same database. Oracle Grid Infrastructure (Clusterware) is used to enable communication between all of the cluster nodes to maintain a single logical instance that has access to a shared set of database files, normally located on shared FC storage because of the performance requirements.

Note there are two versions of Oracle RAC, Oracle RAC One Node and Oracle RAC. Oracle RAC One Node is a traditional active/passive cluster, i.e. one node is active.  Oracle RAC One Node is available with Oracle Standard Edition. With Oracle RAC all nodes are active, which is important from a VPLEX and availability perspective. Oracle RAC is only available for Oracle Enterprise Edition.

So Oracle RAC enables continuous Oracle instance availability, but all nodes are normally still only talking to a single set of database files on a shared storage array and we know that normally a LUN can only be active on a single array. Therefore if you lose the array, your Oracle environment stops working. To overcome this problem Oracle RAC can be deployed with Oracle ASM mirroring or other host based Logical Volume Manager mirroring to enable a primary extent on one storage array and secondary extent on a second storage array. Any writes are written to all extent copies and by default the read is always from the primary extent. If a read fails it will attempt to read from another mirror. When the two storage arrays are located in two physically separate sites and the RAC nodes also distributed between those two sites, to enable better availability, the solution becomes known as Oracle Extended (stretched) RAC. The Oracle licensing for normal or extended RAC solutions is the same. Because of the potential for split-brain Oracle Extended RAC requires a tie breaking voting disk, normally NFS or iSCSI based to be accessible from a third location. When in an Extended RAC configuration it is possible to configure Oracle RAC for local reads in each location. Extended RAC enables both storage availability and site availability.

The problem for Oracle and most Oracle Administrators is that ASM or LVM mirroring can be complex to implement and operate as it requires all RAC nodes be cross connected across all storage platforms using ISLs, it uses host based mirroring which consumes cycles from the Oracle servers and in the event of a storage or site outage can require considerable effort to remediate the mirrors when trying to get back to a normal operational state.

In simple terms VPLEX Metro reduces the operational complexity of Oracle Extended RAC by doing away with the need for ASM mirroring, you do of course still need Oracle RAC licenses. ASM is still required for shared access to the same volumes, it is just not used for mirroring and therefore does not require all hosts to be cross connected, host CPU cycles etc. It also enables advanced availability;  if a storage array in either site fails the RAC nodes at both sites continue running and if a sites fails the remaining Oracle RAC nodes continue running with no manual intervention required. Unlike ASM mirroring with a primary extent and secondary extent, all RAC nodes are reading and writing to a single shared distributed virtual volume which is concurrently read/write accessible at both sites. This enables I/O utilisation of the storage infrastructure at both sites. VPLEX Metro with Oracle RAC is a Oracle Certified solution and joint best practices whitepaper can be found here.

So how does this compare to other solutions, such as NetApp MetroCluster and IBM Split Node SVC configurations. Both solutions work on a preferred node concept, see previous blog entry which requires all hosts be cross-connected with costly ISLs (something we were trying to avoid) and performance is only derived from half the available storage infrastructure.

NetApp MetroCluster continues to use ASM mirroring across the sites, and a full site failure requires that  “a manual force failover of the NetApp MetroCluster needs to be performed to declare the disaster due a split brain scenario.” – TR-3816. Like NetApp, IBM SVC Split Node Clusters and virtual volumes accessible across both sites from a single SVC node could theoretically be used to support  shared access from Oracle RAC nodes distributed over two sites, however at the time of writing this I can find no evidence that this is proposed by IBM or certified by Oracle, instead IBM’s whitepapers also talk about the use of a standard ASM mirroring solution.

Finally remember that we are also only talking about one aspect of VPLEX capability here. If the Oracle instances were virtualised we could also non-disruptively move them between datacentres. Also ASM mirroring is only good for Oracle, the whole point of VPLEX is that it’s benefits extend over multiple applications.