I've been searching for any hints I can find out my problem, but nothing has worked so I'm trying here. I'm seeing this message in /var/log/messages:
Jul 13 10:45:59 oratest1 kernel: sd 3:0:1:0: reservation conflictJul 13 10:45:59 oratest1 kernel: sd 3:0:1:0: [sdc] Unhandled error codeJul 13 10:45:59 oratest1 kernel: sd 3:0:1:0: [sdc] Result: hostbyte=DID_OK driverbyte=DRIVER_OKJul 13 10:45:59 oratest1 kernel: sd 3:0:1:0: [sdc] CDB: Write(10): 2a 00 00 08 00 11 00 00 01 00
I believe this is what is causing errors like this to show up in my Oracle alert logs:
<msg time='2012-07-13T14:09:01.596-05:00' org_id='oracle' comp_id='rdbms'
client_id='' type='UNKNOWN' level='16'
host_id='oratest1.harding.edu' host_addr='10.10.2.195' module=''
pid='5205'>
<txt>Errors in file /u01/app/oracle/diag/rdbms/awdv/awdv1/trace/awdv1_ckpt_5205.trc:
ORA-00206: error in writing (block 3, # blocks 1) of control file
ORA-00202: control file: '+DATA/awdv/controlfile/current.283.787679375'
ORA-15078: ASM diskgroup was forcibly dismounted
ORA-15078: ASM diskgroup was forcibly dismounted
</txt>
</msg>
This is the first time I've tried to set up Oracle RAC in VMware so I probably missed something very small, but I cannot for the life of me find what it is.
My setup is:
RHEL 6.3 (upgraded from a fresh 6.2 install)
vSphere 4.1
Oracle RAC 11.2.0.3
Latest VMware Tools
I'm using vmdk files for all my disk groups. The cluster starts just fine and I can connect to the databases, but they rarely stay up more than a few hours with these errors.
I'm not using fence_scsi or any of the sg3_utils stuff, etc...
I've got disk.locking="false" in my vmx file and all the multi-writer flags for the scsi drives used by ASM.
There are no errors in the vmware.log file.
I feel like I just set something up wrong and maybe VMware and the oracle cluster are fighting each other with locks?
Has anyone seen anything like this or have any suggestions where to look next?