nico torres
2014-11-03 17:58:29 UTC
Hi, list.
I've had this situation last saturday. I got a call, telling me that no
application could get into the database; checked alert log, ORA-00018:
maximum number of sessions exceeded everywhere.
Db host felt kind of clunky and slow, so I checked htop, 100% utilization
on all CPU's, around *462 load*, wow.
Tried to log / as sysdba, naturally couldn't, tried with -prelim flag,
neither.
I finally could get into the database, and immediately run snapper, and
what I saw was all sessions waiting for "*cursor: mutex S*", with no
running queries, no sql_id available. So I came into the conclussion that
all sesions hung with the same wait, and in the same time, applications
logged in more and more sessions, so it reached max sessions and nobody
else could login.
All of those sessions had different blockers, blocking each other, and
different object#, so I coulnd't find a main culprit either.
Tried killing those sessions, couldn't; Tried closing the instance with a
shutdown immediate and it got hung, so I ended up issuing a shutdown abort.
Instance closed, and started up pretty fast and things got back to normal.
Looking at AWR snaps from that timestamp, I got:
Top 5 Timed Foreground Events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Avg
wait % DB
Event Waits Time(s) (ms)
time
------------------------------ ------------ ----------- ------ ------
cursor: mutex S 20,166,853 1,057,135 52 86.5
library cache lock 90,243 91,516 1014 7.5
DB CPU 43,678
3.6
library cache: mutex X 74,002 10,816 146 .9
db file sequential read 276,847 1,417 5 .1
As expected, lots of "cursor: mutex S", but also Library cache lock and
library cache:mutex X.
Load Profile Per Second Per Transaction
~~~~~~~~~~~~ --------------- ---------------
DB Time(s): 249.6 1.4
DB CPU(s): 8.9 0.1
Redo size: 77,451.6 438.3
Logical reads: 29,037.2 164.3
Block changes: 320.9 1.8
Physical reads: 70.3 0.4
Physical writes: 23.5 0.1
User calls: 755.3 4.3
Parses: 197.7 1.1
Hard parses: 0.2 0.0
W/A MB processed: 0.1 0.0
Logons: 1.9 0.0
Executes: 203.8 1.2
Rollbacks: 173.7 1.0
Transactions: 176.7
Hard parses not a concern here.
Statistic Name Time (s)
% of DB Time
------------------------------------------ ------------------ ------------
parse time elapsed 1,217,693.8
99.6
connection management call elapsed time 868,991.0 71.1
DB CPU
43,677.7 3.6
All sessions clearly hung at parse time.
Foreground Wait Events
cursor: mutex S 20,166,853 100 1,057,135 52 23.3
86.5
library cache lock 90,243 0 91,516 1014
0.1 7.5
library cache: mutex X 74,002 0 10,816 146 0.1
.9
db file sequential read 276,847 0 1,417 5 0.3
.1
cursor: pin S wait on X 94 0 164 1749
0.0 .0
I searched around MOS and some bugs appeared, but couldn't find one that
matches entirely.
Some information about the database:
-------------------------
Oracle *11.2.0.1**, *it needs some patching urgently and a bug is my main
suspect.
It uses ASSM
begin end
SGA use (MB): 6,656.0 6,656.0
PGA use (MB): 561.1 1,213.3
Cache sizes between the snaps:
Buffer Cache: 3,840M
Shared Pool Size: 2,496M
Parameters:
memory_target: 13421772800
open_cursors 300
processes 600
sessions 928
session_cached_cursors 50
Could anyone please give me some advice on how to further investigate this
issue? Any other information needed?
Thanks
I've had this situation last saturday. I got a call, telling me that no
application could get into the database; checked alert log, ORA-00018:
maximum number of sessions exceeded everywhere.
Db host felt kind of clunky and slow, so I checked htop, 100% utilization
on all CPU's, around *462 load*, wow.
Tried to log / as sysdba, naturally couldn't, tried with -prelim flag,
neither.
I finally could get into the database, and immediately run snapper, and
what I saw was all sessions waiting for "*cursor: mutex S*", with no
running queries, no sql_id available. So I came into the conclussion that
all sesions hung with the same wait, and in the same time, applications
logged in more and more sessions, so it reached max sessions and nobody
else could login.
All of those sessions had different blockers, blocking each other, and
different object#, so I coulnd't find a main culprit either.
Tried killing those sessions, couldn't; Tried closing the instance with a
shutdown immediate and it got hung, so I ended up issuing a shutdown abort.
Instance closed, and started up pretty fast and things got back to normal.
Looking at AWR snaps from that timestamp, I got:
Top 5 Timed Foreground Events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Avg
wait % DB
Event Waits Time(s) (ms)
time
------------------------------ ------------ ----------- ------ ------
cursor: mutex S 20,166,853 1,057,135 52 86.5
library cache lock 90,243 91,516 1014 7.5
DB CPU 43,678
3.6
library cache: mutex X 74,002 10,816 146 .9
db file sequential read 276,847 1,417 5 .1
As expected, lots of "cursor: mutex S", but also Library cache lock and
library cache:mutex X.
Load Profile Per Second Per Transaction
~~~~~~~~~~~~ --------------- ---------------
DB Time(s): 249.6 1.4
DB CPU(s): 8.9 0.1
Redo size: 77,451.6 438.3
Logical reads: 29,037.2 164.3
Block changes: 320.9 1.8
Physical reads: 70.3 0.4
Physical writes: 23.5 0.1
User calls: 755.3 4.3
Parses: 197.7 1.1
Hard parses: 0.2 0.0
W/A MB processed: 0.1 0.0
Logons: 1.9 0.0
Executes: 203.8 1.2
Rollbacks: 173.7 1.0
Transactions: 176.7
Hard parses not a concern here.
Statistic Name Time (s)
% of DB Time
------------------------------------------ ------------------ ------------
parse time elapsed 1,217,693.8
99.6
connection management call elapsed time 868,991.0 71.1
DB CPU
43,677.7 3.6
All sessions clearly hung at parse time.
Foreground Wait Events
cursor: mutex S 20,166,853 100 1,057,135 52 23.3
86.5
library cache lock 90,243 0 91,516 1014
0.1 7.5
library cache: mutex X 74,002 0 10,816 146 0.1
.9
db file sequential read 276,847 0 1,417 5 0.3
.1
cursor: pin S wait on X 94 0 164 1749
0.0 .0
I searched around MOS and some bugs appeared, but couldn't find one that
matches entirely.
Some information about the database:
-------------------------
Oracle *11.2.0.1**, *it needs some patching urgently and a bug is my main
suspect.
It uses ASSM
begin end
SGA use (MB): 6,656.0 6,656.0
PGA use (MB): 561.1 1,213.3
Cache sizes between the snaps:
Buffer Cache: 3,840M
Shared Pool Size: 2,496M
Parameters:
memory_target: 13421772800
open_cursors 300
processes 600
sessions 928
session_cached_cursors 50
Could anyone please give me some advice on how to further investigate this
issue? Any other information needed?
Thanks