Discussion:
RMAN "stuck" on SQLNet
Chris Taylor
2018-01-04 16:51:02 UTC
Permalink
12.1.0.2
RHEL 7 64-bit
backup location = NAS storage
Backup Type = Level 0
Backup is of Production Standby DB
Backup does NOT use catalog

I need to know how to dig into this - specifically sid 2269. Can I strace
this or something on linux - how would YOU go about investigating this?


V$SESSION shows:
SID SERIAL# USERNAME PROGRAM EVENT WAIT_TIME_SECS WAIT_CLASS
65 45315 SYS ***@hostname.redacted SQL*Net message from client 45934.531794
Idle
148 6769 SYS ***@hostname.redacted SQL*Net message from client 44724.14675
Idle
338 3826 SYS ***@hostname.redacted SQL*Net message from client 170.757525
Idle
359 59947 SYS ***@hostname.redacted SQL*Net message from client 170.767085
Idle
380 37145 SYS ***@hostname.redacted SQL*Net message from client 0.006576
Idle
401 10755 SYS ***@hostname.redacted SQL*Net message from client 767.152796
Idle
2185 26837 SYS ***@hostname.redacted SQL*Net message from client 0.791679
Idle
2227 53503 SYS ***@hostname.redacted SQL*Net message from client
44724.146262 Idle
2269 28738 SYS ***@hostname.redacted SQL*Net break/reset to client
45461.506918 Application
2290 55952 SYS ***@hostname.redacted SQL*Net message from client
45384.358042 Idle
2311 12729 SYS ***@hostname.redacted SQL*Net message from client
45204.297414 Idle
2332 28712 SYS ***@hostname.redacted SQL*Net message from client
45044.216679 Idle
2353 384 SYS ***@hostname.redacted SQL*Net message from client 45934.541706
Idle
2374 45826 SYS ***@hostname.redacted SQL*Net message from client
46308.688279 Idle
2395 47415 SYS ***@hostname.redacted SQL*Net message from client
45124.256283 Idle
2416 58529 SYS ***@hostname.redacted SQL*Net message from client
44994.200668 Idle
2437 37004 SYS ***@hostname.redacted SQL*Net message from client
45204.287282 Idle
2458 31528 SYS ***@hostname.redacted SQL*Net message from client
44974.182189 Idle
2479 28402 SYS ***@hostname.redacted SQL*Net message from client
45584.422033 Idle
2500 12880 SYS ***@hostname.redacted SQL*Net message from client
45074.231328 Idle
2542 14309 SYS ***@hostname.redacted SQL*Net message from client
45174.272327 Idle
2563 31481 SYS ***@hostname.redacted SQL*Net message from client
45574.408916 Idle
Stefan Koehler
2018-01-04 17:33:10 UTC
Permalink
Hey Chris,
there is nothing to strace as "SQL*Net break/reset to client" is shown when an error happens or an unhandled exception is raised. As RMAN is just PL/SQL code I would check for such issues.

Just trace RMAN (debug command line option) and have a look what is happening :-)

Best Regards
Stefan Koehler

Independent Oracle performance consultant and researcher
Website: http://www.soocs.de
Post by Chris Taylor
12.1.0.2
RHEL 7 64-bit
backup location = NAS storage
Backup Type = Level 0
Backup is of Production Standby DB
Backup does NOT use catalog
 
I need to know how to dig into this - specifically sid 2269.  Can I strace this or something on linux - how would YOU go about investigating this?
SID SERIAL# USERNAME PROGRAM EVENT WAIT_TIME_SECS WAIT_CLASS
 
--
http://www.freelists.org/webpage/oracle-l
Chris Taylor
2018-01-04 18:37:48 UTC
Permalink
An update on this. I killed the session waiting on SQLNet break/reset to
client, and RMAN picked back up with the following information:

RMAN-03009: failure of backup command on c1 channel at 01/04/2018 13:04:28
RMAN-10038: database session for channel c1 terminated unexpectedly
channel c1 disabled, job failed on it will be run on another channel
channel c3: starting compressed incremental level 0 datafile backup set
channel c3: specifying datafile(s) in backup set
input datafile file number=00461
name=/oradata/sentry/sentinel/sentinel_nir_111.dbf
input datafile file number=00480
name=/oradata/sentry/sentinel/sentinel_nir_127.dbf
input datafile file number=00967
name=/oradata/sentry/sentinel/sentinel_nir_139.dbf
input datafile file number=01422
name=/oradata/sentry/sentinel/sentinel_nir_160.dbf
channel c3: starting piece 1 at 2018-01-04 13:04:29


I still have no idea why that session "hung" on a sqlnet break/reset to
client however.

Chris


On Thu, Jan 4, 2018 at 10:51 AM, Chris Taylor <
Post by Chris Taylor
12.1.0.2
RHEL 7 64-bit
backup location = NAS storage
Backup Type = Level 0
Backup is of Production Standby DB
Backup does NOT use catalog
I need to know how to dig into this - specifically sid 2269. Can I strace
this or something on linux - how would YOU go about investigating this?
SID SERIAL# USERNAME PROGRAM
​ ​
EVENT
​ ​
WAIT_TIME_SECS WAIT_CLASS
6
45461.506918 Application
Chris Taylor
2018-01-04 18:46:34 UTC
Permalink
Aw, now we're getting somewhere:

In the alert log, am getting a dump for the session when it tried to fail
again:

Errors in file
/oracle/app/diag/rdbms/[redacted]/trace/[redacted]_ora_32208.trc
(incident=116378):
ORA-00600: internal error code, arguments: [17182], [0x7FC30E9E8FE8], [],
[], [], [], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [krbb3crw_inv_blk_2], [4], [],
[], [], [], [], [], [], [], [], []


On Thu, Jan 4, 2018 at 12:37 PM, Chris Taylor <
Post by Chris Taylor
An update on this. I killed the session waiting on SQLNet break/reset
RMAN-03009: failure of backup command on c1 channel at 01/04/2018 13:04:28
RMAN-10038: database session for channel c1 terminated unexpectedly
channel c1 disabled, job failed on it will be run on another channel
channel c3: starting compressed incremental level 0 datafile backup set
channel c3: specifying datafile(s) in backup set
input datafile file number=00461 name=/oradata/sentry/sentinel/
sentinel_nir_111.dbf
input datafile file number=00480 name=/oradata/sentry/sentinel/
sentinel_nir_127.dbf
input datafile file number=00967 name=/oradata/sentry/sentinel/
sentinel_nir_139.dbf
input datafile file number=01422 name=/oradata/sentry/sentinel/
sentinel_nir_160.dbf
channel c3: starting piece 1 at 2018-01-04 13:04:29
I still have no idea why that session "hung" on a sqlnet break/reset to
client however.
Chris
On Thu, Jan 4, 2018 at 10:51 AM, Chris Taylor <
Post by Chris Taylor
12.1.0.2
RHEL 7 64-bit
backup location = NAS storage
Backup Type = Level 0
Backup is of Production Standby DB
Backup does NOT use catalog
I need to know how to dig into this - specifically sid 2269. Can I
strace this or something on linux - how would YOU go about investigating
this?
SID SERIAL# USERNAME PROGRAM
​ ​
EVENT
​ ​
WAIT_TIME_SECS WAIT_CLASS
6
45461.506918 Application
Jared Still
2018-01-19 00:26:29 UTC
Permalink
A little late to the party here, but...

I would get the PID of the session and start investigating via /proc/PID/fd
and look for sockets (assuming linux)

[***@oravm01 ~]# ps -flea| grep [r]man
0 S oracle 29642 28063 6 80 0 - 24394 n_tty_ 16:19 pts/5 00:00:00
rman target sys/XXX@//oravm01:1521/oravm.jks.com

[***@oravm01 ~]# l /proc/29642/fd/ | grep socket
lrwx------ 1 oracle oinstall 64 Jan 18 16:19 10 -> socket:[166687484]
lrwx------ 1 oracle oinstall 64 Jan 18 16:19 9 -> socket:[166687335]


Let's look at socket #9 and find out where it is:

[***@oravm01 ~]# head -1 /proc/net/tcp; grep ' 166687335 ' /proc/net/tcp
sl local_address rem_address st tx_queue rx_queue tr tm->when
retrnsmt uid timeout inode
37: 8401A8C0:47DF 8401A8C0:05F1 01 00000000:00000000 00:00000000
00000000 300 0 166687335 1 ffff880033e19280 20 3 0 5 -1


The remote address matches the local address in this case. 0541 is hex for
1521.

If the rem_address were different, the IP can be obtained by breaking the
hex value 8401A8C0 into octets.

84: 132
01: 001
A8: 168
C0: 192

So the address of the server connected to is 192.168.1.132.

At this point there are couple of things to do:

1) check what's going on with the remote server
2) strace the process connected to the remote port.

I've seen issues when the connection gets made through a 'bad' port on a
switch.

This would be intermittent, and so may be what you saw.

And of course, repeat for all sockets for the process if needed.

HTH

Jared



Jared Still
Certifiable Oracle DBA and Part Time Perl Evangelist
Principal Consultant at Pythian
Pythian Blog http://www.pythian.com/blog/author/still/
Github: https://github.com/jkstill



On Thu, Jan 4, 2018 at 8:51 AM, Chris Taylor <
Post by Chris Taylor
12.1.0.2
RHEL 7 64-bit
backup location = NAS storage
Backup Type = Level 0
Backup is of Production Standby DB
Backup does NOT use catalog
I need to know how to dig into this - specifically sid 2269. Can I strace
this or something on linux - how would YOU go about investigating this?
SID SERIAL# USERNAME PROGRAM EVENT WAIT_TIME_SECS WAIT_CLASS
45934.531794 Idle
44724.14675 Idle
Idle
170.767085 Idle
Idle
767.152796 Idle
Idle
44724.146262 Idle
45461.506918 Application
45384.358042 Idle
45204.297414 Idle
45044.216679 Idle
45934.541706 Idle
46308.688279 Idle
45124.256283 Idle
44994.200668 Idle
45204.287282 Idle
44974.182189 Idle
45584.422033 Idle
45074.231328 Idle
45174.272327 Idle
45574.408916 Idle
Loading...