Discussion:
hung db
MJ Mody
2016-01-09 01:03:56 UTC
Permalink
Nothing like ringing in a new year than with a hung database.
In this case, running 11.2.0.3 EE with ASO/TDE on W2K8-R2 with last CPU.
From what is known, external backups (tape writes) were taking place in backup software, both rman and data pump (the .dmp files were created early in the evening). Additionally, nightly EOD processing was also taking place.

Performed a server reboot to bring the database online. Point of concern is from 9pm onwards, until the database came back online (~1:30am), there is no entries in the alert log.

Feel free to share if anyone has come across this or have some words of wisdom for troubleshooting and remediation.

Thank you in advance.

Best
MJ--
http://www.freelists.org/webpage/oracle-l
De DBA
2016-01-09 03:26:48 UTC
Permalink
Usually I'd suspect an out-of-memory situation. The scenario reminds me of a case I had where the backup software (Tivoli, I think), would use Windows shadow copy to execute an RMAN backup. The shadow copy process never exited and after a (large) number of backups the Oracle process memory was exhausted and the database hang. From memory that was also W2K8R2/11.2.0.3. I believe there was a bug in the Oracle shadow copy interface.

You did check the Windows Event Logs (Application and System) I presume?

Hth,
Tony
Post by MJ Mody
Nothing like ringing in a new year than with a hung database.
In this case, running 11.2.0.3 EE with ASO/TDE on W2K8-R2 with last CPU.
From what is known, external backups (tape writes) were taking place in backup software, both rman and data pump (the .dmp files were created early in the evening). Additionally, nightly EOD processing was also taking place.
Performed a server reboot to bring the database online. Point of concern is from 9pm onwards, until the database came back online (~1:30am), there is no entries in the alert log.
Feel free to share if anyone has come across this or have some words of wisdom for troubleshooting and remediation.
Thank you in advance.
Best
MJ--
http://www.freelists.org/webpage/oracle-l
--
http://www.freelists.org/webpage/oracle-l
MJ Mody
2016-01-09 04:21:49 UTC
Permalink
Suspected resource contention of sorts. Follow-up with backup vendor, CommVault, yielded a file lock with cv/oracle library (i.e. oci.dll) from an earlier backup. Will need to review the Oracle shadow copy interface.

Still a mystery as to why nothing was written to alert log.

Cheers
MJ
Post by De DBA
Usually I'd suspect an out-of-memory situation. The scenario reminds me of a case I had where the backup software (Tivoli, I think), would use Windows shadow copy to execute an RMAN backup. The shadow copy process never exited and after a (large) number of backups the Oracle process memory was exhausted and the database hang. From memory that was also W2K8R2/11.2.0.3. I believe there was a bug in the Oracle shadow copy interface.
You did check the Windows Event Logs (Application and System) I presume?
Hth,
Tony
Post by MJ Mody
Nothing like ringing in a new year than with a hung database.
In this case, running 11.2.0.3 EE with ASO/TDE on W2K8-R2 with last CPU.
From what is known, external backups (tape writes) were taking place in backup software, both rman and data pump (the .dmp files were created early in the evening). Additionally, nightly EOD processing was also taking place.
Performed a server reboot to bring the database online. Point of concern is from 9pm onwards, until the database came back online (~1:30am), there is no entries in the alert log.
Feel free to share if anyone has come across this or have some words of wisdom for troubleshooting and remediation.
Thank you in advance.
Best
MJ--
http://www.freelists.org/webpage/oracle-l
--
http://www.freelists.org/webpage/oracle-l
Mladen Gogala
2016-01-09 22:46:36 UTC
Permalink
Post by MJ Mody
Suspected resource contention of sorts. Follow-up with backup vendor, CommVault, yielded a file lock with cv/oracle library (i.e. oci.dll) from an earlier backup. Will need to review the Oracle shadow copy interface.
Still a mystery as to why nothing was written to alert log.
Was there anything in the ClOraAgent.log? Did you open a case with
CommVault support? Do you have any other backup system on the machine?
Simpana client library is called ORASBT.DLL and usually resides in
C:\Program Files\Commvault\Simpana\Base. If there was another backup
software on the machine, it can put ORASBT.DLL either in
%ORACLE_HOME%\lib or in C:\Windows\system32. Please, search the entire
machine for the ORASBT.DLL libraries, since the library clash can be
quite serious problem. The name of the library is mandated by Oracle
Corp. and all backup suites (Commvault, TSM, NetBackup, Avamar) must use
the same name.
Regards
Post by MJ Mody
Cheers
MJ
Post by De DBA
Usually I'd suspect an out-of-memory situation. The scenario reminds me of a case I had where the backup software (Tivoli, I think), would use Windows shadow copy to execute an RMAN backup. The shadow copy process never exited and after a (large) number of backups the Oracle process memory was exhausted and the database hang. From memory that was also W2K8R2/11.2.0.3. I believe there was a bug in the Oracle shadow copy interface.
You did check the Windows Event Logs (Application and System) I presume?
Hth,
Tony
Post by MJ Mody
Nothing like ringing in a new year than with a hung database.
In this case, running 11.2.0.3 EE with ASO/TDE on W2K8-R2 with last CPU.
From what is known, external backups (tape writes) were taking place in backup software, both rman and data pump (the .dmp files were created early in the evening). Additionally, nightly EOD processing was also taking place.
Performed a server reboot to bring the database online. Point of concern is from 9pm onwards, until the database came back online (~1:30am), there is no entries in the alert log.
Feel free to share if anyone has come across this or have some words of wisdom for troubleshooting and remediation.
Thank you in advance.
Best
MJ--
http://www.freelists.org/webpage/oracle-l
--
http://www.freelists.org/webpage/oracle-l
--
Mladen Gogala
Oracle DBA
http://mgogala.freehostia.com

--
http://www.freelists.org/webpage/oracle-l
MJ Mody
2016-01-11 20:12:55 UTC
Permalink
Actual error CommVault engineer pulled from the logs is 'connection with the OraSbtThread failed, possibly due to a stuck DLL'

I think this could be the smoking gun
Post by MJ Mody
Suspected resource contention of sorts. Follow-up with backup vendor, CommVault, yielded a file lock with cv/oracle library (i.e. oci.dll) from an earlier backup. Will need to review the Oracle shadow copy interface.
Still a mystery as to why nothing was written to alert log.
Was there anything in the ClOraAgent.log? Did you open a case with CommVault support? Do you have any other backup system on the machine? Simpana client library is called ORASBT.DLL and usually resides in C:\Program Files\Commvault\Simpana\Base. If there was another backup software on the machine, it can put ORASBT.DLL either in %ORACLE_HOME%\lib or in C:\Windows\system32. Please, search the entire machine for the ORASBT.DLL libraries, since the library clash can be quite serious problem. The name of the library is mandated by Oracle Corp. and all backup suites (Commvault, TSM, NetBackup, Avamar) must use the same name.
Regards
Post by MJ Mody
Cheers
MJ
Post by De DBA
Usually I'd suspect an out-of-memory situation. The scenario reminds me of a case I had where the backup software (Tivoli, I think), would use Windows shadow copy to execute an RMAN backup. The shadow copy process never exited and after a (large) number of backups the Oracle process memory was exhausted and the database hang. From memory that was also W2K8R2/11.2.0.3. I believe there was a bug in the Oracle shadow copy interface.
You did check the Windows Event Logs (Application and System) I presume?
Hth,
Tony
Post by MJ Mody
Nothing like ringing in a new year than with a hung database.
In this case, running 11.2.0.3 EE with ASO/TDE on W2K8-R2 with last CPU.
From what is known, external backups (tape writes) were taking place in backup software, both rman and data pump (the .dmp files were created early in the evening). Additionally, nightly EOD processing was also taking place.
Performed a server reboot to bring the database online. Point of concern is from 9pm onwards, until the database came back online (~1:30am), there is no entries in the alert log.
Feel free to share if anyone has come across this or have some words of wisdom for troubleshooting and remediation.
Thank you in advance.
Best
MJ--
http://www.freelists.org/webpage/oracle-l
--
http://www.freelists.org/webpage/oracle-l
--
Mladen Gogala
Oracle DBA
http://mgogala.freehostia.com
--
http://www.freelists.org/webpage/oracle-l
--
http://www.freelists.org/webpage/oracle-l
Mladen Gogala
2016-01-11 21:02:57 UTC
Permalink
Hi MJ,
What is a "stuck DLL"? Was there a recent change of DLL files? The
allocate channel command usually goes like this:

allocate channel c0 device type SBT;

The problems can happen if there is orasbt.dll in the path, because
Windows will use the first orasbt.dll which it can find. Simpana expects
to find the library in C:\Program Files\Commvault\Simpana\Base. If it
encounters the same library in a wrong directory, like
C:\Windows\System32, the whole thing will croak.

Regards
Post by MJ Mody
Actual error CommVault engineer pulled from the logs is 'connection with the OraSbtThread failed, possibly due to a stuck DLL'
I think this could be the smoking gun
Post by MJ Mody
Suspected resource contention of sorts. Follow-up with backup vendor, CommVault, yielded a file lock with cv/oracle library (i.e. oci.dll) from an earlier backup. Will need to review the Oracle shadow copy interface.
Still a mystery as to why nothing was written to alert log.
Was there anything in the ClOraAgent.log? Did you open a case with CommVault support? Do you have any other backup system on the machine? Simpana client library is called ORASBT.DLL and usually resides in C:\Program Files\Commvault\Simpana\Base. If there was another backup software on the machine, it can put ORASBT.DLL either in %ORACLE_HOME%\lib or in C:\Windows\system32. Please, search the entire machine for the ORASBT.DLL libraries, since the library clash can be quite serious problem. The name of the library is mandated by Oracle Corp. and all backup suites (Commvault, TSM, NetBackup, Avamar) must use the same name.
Regards
Post by MJ Mody
Cheers
MJ
Post by De DBA
Usually I'd suspect an out-of-memory situation. The scenario reminds me of a case I had where the backup software (Tivoli, I think), would use Windows shadow copy to execute an RMAN backup. The shadow copy process never exited and after a (large) number of backups the Oracle process memory was exhausted and the database hang. From memory that was also W2K8R2/11.2.0.3. I believe there was a bug in the Oracle shadow copy interface.
You did check the Windows Event Logs (Application and System) I presume?
Hth,
Tony
Post by MJ Mody
Nothing like ringing in a new year than with a hung database.
In this case, running 11.2.0.3 EE with ASO/TDE on W2K8-R2 with last CPU.
From what is known, external backups (tape writes) were taking place in backup software, both rman and data pump (the .dmp files were created early in the evening). Additionally, nightly EOD processing was also taking place.
Performed a server reboot to bring the database online. Point of concern is from 9pm onwards, until the database came back online (~1:30am), there is no entries in the alert log.
Feel free to share if anyone has come across this or have some words of wisdom for troubleshooting and remediation.
Thank you in advance.
Best
MJ--
http://www.freelists.org/webpage/oracle-l
--
http://www.freelists.org/webpage/oracle-l
--
Mladen Gogala
Oracle DBA
http://mgogala.freehostia.com
--
http://www.freelists.org/webpage/oracle-l
--
Mladen Gogala
Oracle DBA
http://mgogala.freehostia.com

--
http://www.freelists.org/webpage/oracle-l
MJ Mody
2016-01-11 22:24:18 UTC
Permalink
stuck dll as in dll is locked by a previous backup process. %systemroot%\system32 does not have the orasbt.dll. The only place where I found it was in c:\program files\commvault\simpana\base.

Have a question open with commvault if post backup process can be refined to ensure locks to dlls are released.

Thanks
Post by Mladen Gogala
Hi MJ,
allocate channel c0 device type SBT;
The problems can happen if there is orasbt.dll in the path, because Windows will use the first orasbt.dll which it can find. Simpana expects to find the library in C:\Program Files\\Simpana\Base. If it encounters the same library in a wrong directory, like C:\Windows\System32, the whole thing will croak.
Regards
Post by MJ Mody
Actual error CommVault engineer pulled from the logs is 'connection with the OraSbtThread failed, possibly due to a stuck DLL'
I think this could be the smoking gun
Post by MJ Mody
Suspected resource contention of sorts. Follow-up with backup vendor, CommVault, yielded a file lock with cv/oracle library (i.e. oci.dll) from an earlier backup. Will need to review the Oracle shadow copy interface.
Still a mystery as to why nothing was written to alert log.
Was there anything in the ClOraAgent.log? Did you open a case with CommVault support? Do you have any other backup system on the machine? Simpana client library is called ORASBT.DLL and usually resides in C:\Program Files\Commvault\Simpana\Base. If there was another backup software on the machine, it can put ORASBT.DLL either in %ORACLE_HOME%\lib or in C:\Windows\system32. Please, search the entire machine for the ORASBT.DLL libraries, since the library clash can be quite serious problem. The name of the library is mandated by Oracle Corp. and all backup suites (Commvault, TSM, NetBackup, Avamar) must use the same name.
Regards
Post by MJ Mody
Cheers
MJ
Post by De DBA
Usually I'd suspect an out-of-memory situation. The scenario reminds me of a case I had where the backup software (Tivoli, I think), would use Windows shadow copy to execute an RMAN backup. The shadow copy process never exited and after a (large) number of backups the Oracle process memory was exhausted and the database hang. From memory that was also W2K8R2/11.2.0.3. I believe there was a bug in the Oracle shadow copy interface.
You did check the Windows Event Logs (Application and System) I presume?
Hth,
Tony
Post by MJ Mody
Nothing like ringing in a new year than with a hung database.
In this case, running 11.2.0.3 EE with ASO/TDE on W2K8-R2 with last CPU.
From what is known, external backups (tape writes) were taking place in backup software, both rman and data pump (the .dmp files were created early in the evening). Additionally, nightly EOD processing was also taking place.
Performed a server reboot to bring the database online. Point of concern is from 9pm onwards, until the database came back online (~1:30am), there is no entries in the alert log.
Feel free to share if anyone has come across this or have some words of wisdom for troubleshooting and remediation.
Thank you in advance.
Best
MJ--
http://www.freelists.org/webpage/oracle-l
--
http://www.freelists.org/webpage/oracle-l
--
Mladen Gogala
Oracle DBA
http://mgogala.freehostia.com
--
http://www.freelists.org/webpage/oracle-l
--
Mladen Gogala
Oracle DBA
http://mgogala.freehostia.com
--
http://www.freelists.org/webpage/oracle-l
Mladen Gogala
2016-01-11 23:49:33 UTC
Permalink
Post by MJ Mody
stuck dll as in dll is locked by a previous backup process. %systemroot%\system32 does not have the orasbt.dll. The only place where I found it was in c:\program files\commvault\simpana\base.
Have a question open with commvault if post backup process can be refined to ensure locks to dlls are released.
Thanks
Simpana comes with the process manager. You should be able to open
process manager and restart all client processes on Windows.
--
Mladen Gogala
Oracle DBA
http://mgogala.freehostia.com

--
http://www.freelists.org/webpage/oracle-l
MJ Mody
2016-01-15 21:05:17 UTC
Permalink
Thank you again Experts!! The issue has not surfaced since reboot of the host. Of course the gap in the alert log is still a mystery.

"The truth is out there" - Chris Carter
Post by MJ Mody
stuck dll as in dll is locked by a previous backup process. %systemroot%\system32 does not have the orasbt.dll. The only place where I found it was in c:\program files\commvault\simpana\base.
Have a question open with commvault if post backup process can be refined to ensure locks to dlls are released.
Thanks
Simpana comes with the process manager. You should be able to open process manager and restart all client processes on Windows.
--
Mladen Gogala
Oracle DBA
http://mgogala.freehostia.com
--
http://www.freelists.org/webpage/oracle-l
MJ Mody
2016-01-15 21:33:15 UTC
Permalink
Is it normal for oracle to write elsewhere? Assumption here is hang was severe enough that oracle was not able to spawn processes to write to alert log
the gap may be due to the alert log going else where try searching for it.
Post by MJ Mody
Thank you again Experts!! The issue has not surfaced since reboot of the host. Of course the gap in the alert log is still a mystery.
"The truth is out there" - Chris Carter
Post by MJ Mody
stuck dll as in dll is locked by a previous backup process. %systemroot%\system32 does not have the orasbt.dll. The only place where I found it was in c:\program files\commvault\simpana\base.
Have a question open with commvault if post backup process can be refined to ensure locks to dlls are released.
Thanks
Simpana comes with the process manager. You should be able to open process manager and restart all client processes on Windows.
--
Mladen Gogala
Oracle DBA
http://mgogala.freehostia.com
--
http://www.freelists.org/webpage/oracle-l
--
Howard A. Latham
--
http://www.freelists.org/webpage/oracle-l

Mladen Gogala
2016-01-09 22:38:41 UTC
Permalink
Post by MJ Mody
Nothing like ringing in a new year than with a hung database.
In this case, running 11.2.0.3 EE with ASO/TDE on W2K8-R2 with last CPU.
From what is known, external backups (tape writes) were taking place in backup software, both rman and data pump (the .dmp files were created early in the evening). Additionally, nightly EOD processing was also taking place.
Is there anything in the alert log? What does Windows log say? Is there
a shortage of memory, CPU resources or ORA-00600 in the alert log? There
is not enough information to even speculate.
--
Mladen Gogala
Oracle DBA
http://mgogala.freehostia.com

--
http://www.freelists.org/webpage/oracle-l
Loading...