| racman 2005-09-13, 3:23 am |
| We have a AIX, Solaris and Redhat/Suse environemnt attached to a EMC
Symmetrix DMX 3000 SAN. Running all different OS versions, Veritas
Clusters, Veritas Volume Manager, no Powerpath-just Veritas DMP for
multipathing. Some systems have 2 paths, some have one. Some systems
have the latest firmware and drivers, others don`t. Are HBAs are JNI,
Qlogic for Solaris and Emulex for our AIX/Linux. Brocade switches with
many FA ports, etc. We also have a Windows 2000 box that is in a
cluster with Emulex. We also have Oracle RAC environment. The problem
is since we migrated our servers from HP and SHARK to EMC, our systems
either crash or spit out disk errors in groups almost daily. Some
servers crash a few times a day. All outtages are random. EMC hooked up
an analyzer which traced the windows servers which was creating a scsi
target reset. When we shut down half of the windows cluster, errors
were reported on some of our other Unix servers. This did not resolve
the issues. We feel there may be many resets coming from different
servers. The AIX systems are set to Arbitrated loop but IBM claims
although the setting says arbitrated, it still trys point to point
first. I thought this may have been the issue. Also, there was also
talk of setting the c bit and d bit on the disks. Also, setting max
number of commands queued to the HBA? How about queue depth settings on
the disks set by the OS? Some of our servers dont have erros but youll
see io errors when moving a file within the filesytems and corruption
after a planned reboot. What could be causing these servers to crash
like this?
|