How is an Oracle RAC on VMware setup supposed to deal with storage failure?

Hi all,

I am currently testing Oracle RAC running on a VMware vSphere 6 platform. The main purpose for us to use Oracle RAC is high availability and redundancy. We want to have the Oracle database up and running with zero interruption even if any single piece of hardware fails or is accidentally misconfigured.

Our setup is similar to that described in VMware's own whitepaper, in that we use VMDK disks with the write-sharing parameter enabled in order to attach VMDK disks to two RAC nodes at the same time. The setup in more detail:

We have two geographically separated sites. Let's call them SiteA and SiteB. Each site contains a vSphere 6 cluster and an EMC VNX storage unit. Let's call them ClusterA and StorageA, and ClusterB and StorageB. There is a fast WAN connection between the two sites, making it possible to have inter-site storage traffic. StorageA presents a lun called DatastoreA to both ClusterA and ClusterB. Likewise, StorageB presents a lun called DatastoreB to both ClusterA and ClusterB. These luns are formatted within VMware as VMFS-5 datastores.

We have two virtual machines, let's call them RAC1 and RAC2, both running Windows Server 2012 R2. RAC1 is running on ClusterA, and RAC2 is running on cluster B. The storage for the OS and application is presented by each VM's local storage unit.

Now we create the storage for the Oracle ASM disk groups. In the VMware settings of RAC1 we create two VMFS virtual disks of the same size, the first one on DatastoreA and the second on DatastoreB, making sure to enable multi-writer mode. Then in the VMware settings of RAC2, we connect the exact same disks, again enabling multi-writer mode. We repeat this entire process for all the other disks needed by Oracle RAC.

Now, our DBA can install Oracle RAC and Oracle ASM disk groups on the two servers. He creates disk groups which each contain two failure groups. One failure group contains all the disks on DatastoreA, while the other failure group contains all the disks on DatastoreB. Having finished the Oracle ASM and Oracle RAC configuration, he installs a database.

While testing this setup for resiliency against hardware failures, we wanted to know what happens in the event of a total loss of a single storage unit. To this end, we accessed the management console of StorageA and unpresented DatastoreA from both clusters, meaning both clusters suddenly lost connectivity with DatastoreA, creating a PDL (permanent device loss) situation.

What happens next is that both RAC1 and RAC2 completely freeze, and VMware generates a dialog box for each VM, like this:

After reading up on this, it seems to me that it is standard ESXi behaviour to freeze a virtual machine as soon as it tries to write to a VMDK file that is no longer available. Because both RAC1 and RAC2 are connected and try to write to a VMDK file that is no longer there, both VMs are automatically suspended by ESXi.

This is, of course, exactly what we DO NOT want to happen in a HA solution like Oracle RAC. The way it is now, a single storage failure or even a WAN failure would result in total loss of the database instance, even though one copy of all the database storage is still online. What am I missing here? What is the correct way to configure Oracle RAC on the VMware platform???

I would greatly appreciate any insights.

How is an Oracle RAC on VMware setup supposed to deal with storage failure?

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

charmilles roboform E998

99 God Status for Whatsapp, Facebook

Firefighters attend car crash in Melton Mowbray

Adidas World Cup 2006 TTF Font

The Who – Who’s Next (1971/2023) [High Fidelity Pure Audio Blu-Ray Disc]

Outlook.com issue with window 8

Maryland: State Police report 416 DWI / DUI drivers during December 2014;...

SMI SM320AC MPTool

Final Purple Gang-Related Indictment Ensnared ‘Candy’ Davidson In Drug Bust...

Praye – Wodin (Throwback Music)

Kodad Mandal Sarpanch Wardmumber Mobile Numbers List Part II Nalgonda...

Ek Bar Baby Selfish Hoke Apne Liye Jiyo Na Lyrics Translation | Race 3

CAMDEN CAMPERS SALE IS ON NOW THIS CRACKING VW AUTOHOMES KOMET HAS BEEN...

Font Brazil World Cup 2004 kits

Presence detection with LD2410 and BH1750 - i2c doesnt work

Missing man located Bayview Avenue and Wilket Road area, Alexander Klopot, 31

Why do I get 'Access is Denied' when using Set-Service with Admin privileges?

Lady Gaga – MAYHEM (Bonus Tracks Version) [iTunes Rip M4A]

Java error when using Sky Go app