Chris Taylor
2018-10-30 15:44:22 UTC
Oracle 12.1.0.2 Linux x86-64
Background:
We're running a Level0 and a Level1 Merge of a 75+ TB database to an ASM
diskroup.
** Looking for comments/ideas/suggestions on whether this is 'normal' or if
its really slow? Seems like a problem to me but I'm not sure as I've never
done a Level1 Merge so maybe reading the SYSTEM datafile is supposed to be
slow for this step? **
The Level0 ran fine, the Level1 merge started and ran fine for hours.
Now all but one of my RMAN channels (6 total) are finished but one channel
is writing very slowly.
Tracing the problematic session shows below:
*** 2018-10-30 11:22:06.940
WAIT #0: nam='KSV master wait' ela= 384 indicator=10 p2=0 p3=0 obj#=-1
tim=804631999360
WAIT #0: nam='KSV master wait' ela= 48336 indicator=9 p2=0 p3=0 obj#=-1
tim=804632047895
WAIT #0: nam='ASM file metadata operation' ela= 443 msgop=41 locn=0 p3=0
obj#=-1 tim=804632048128
WAIT #0: nam='kfk: async disk IO' ela= 532 count=1 where=9
timeout=4294967295 obj#=-1 tim=804632048763
WAIT #0: nam='Disk file operations I/O' ela= 220 FileOperation=3 fileno=1
filetype=10 obj#=-1 tim=804632048803
*** 2018-10-30 11:22:16.073
WAIT #0: nam='resmgr:cpu quantum' ela= 6 location=3 consumer group id=4844
=0 obj#=-1 tim=804641132770
*** 2018-10-30 11:22:18.274
WAIT #0: nam='KSV master wait' ela= 398 indicator=10 p2=0 p3=0 obj#=-1
tim=804643333365
WAIT #0: nam='KSV master wait' ela= 49605 indicator=9 p2=0 p3=0 obj#=-1
tim=804643383128
WAIT #0: nam='ASM file metadata operation' ela= 369 msgop=41 locn=0 p3=0
obj#=-1 tim=804643383326
WAIT #0: nam='kfk: async disk IO' ela= 673 count=1 where=9
timeout=4294967295 obj#=-1 tim=804643384139
WAIT #0: nam='Disk file operations I/O' ela= 311 FileOperation=3 fileno=1
filetype=10 obj#=-1 tim=804643384215
*** 2018-10-30 11:22:29.644
WAIT #0: nam='KSV master wait' ela= 517 indicator=10 p2=0 p3=0 obj#=-1
tim=804654703615
WAIT #0: nam='KSV master wait' ela= 52485 indicator=9 p2=0 p3=0 obj#=-1
tim=804654756324
WAIT #0: nam='ASM file metadata operation' ela= 450 msgop=41 locn=0 p3=0
obj#=-1 tim=804654756539
WAIT #0: nam='kfk: async disk IO' ela= 527 count=1 where=9
timeout=4294967295 obj#=-1 tim=804654757294
WAIT #0: nam='Disk file operations I/O' ela= 375 FileOperation=3 fileno=1
filetype=10 obj#=-1 tim=804654757356
File# 1 is the system datafile which is size 32.99 GB.
Looking at the session from v$session shows:
MODULE CLIENT_INFO ACTION OSUSER EVENT WAIT_TIME_SECS WAIT_CLASS
backup incr datafile rman channel=d0 0000080 STARTED16 oracle Disk file
operations I/O 0.000005 User I/O
Strace shows:
io_submit(139813557071872, 1, {{data:0x7f28dcb09c48, pread, filedes:325,
buf:0x7f28dc36d000, nbytes:524288, offset:1734379831296}}) = 1
times(NULL) = 509955577
times(NULL) = 509955577
io_getevents(139813557071872, 4, 128, {{0x7f28dcb09ee8, 0x7f28dcb09ee8,
524288, 0}, {0x7f28dcb0a188, 0x7f28dcb0a188, 524288, 0}}, {0, 0}) = 2
times(NULL) = 509955577
getrusage(0x1 /* RUSAGE_??? */, {ru_utime={88371, 365510}, ru_stime={1383,
903614}, ...}) = 0
times(NULL) = 509955584
io_submit(139813557071872, 1, {{data:0x7f28dcb09ee8, pread, filedes:325,
buf:0x7f28dc2ec000, nbytes:524288, offset:1734380355584}}) = 1
times(NULL) = 509955584
times(NULL) = 509955584
times(NULL) = 509955584
times(NULL) = 509955584
io_submit(139813557071872, 1, {{data:0x7f28dcb0a188, pread, filedes:325,
buf:0x7f28dc04d000, nbytes:524288, offset:1734380879872}}) = 1
times(NULL) = 509955584
times(NULL) = 509955584
io_getevents(139813557071872, 4, 128, {{0x7f28dcb0a428, 0x7f28dcb0a428,
524288, 0}, {0x7f28dcb09c48, 0x7f28dcb09c48, 524288, 0}}, {0, 0}) = 2
times(NULL) = 509955584
So strace shows it streaming operations quickly but I can't figure out why
its taking so long.
The last operation to the rman log file was at 21:55 (9:55pm) last night.
Here's a sample of the RMAN output before it quit writing to the log - you
can see the write times were pretty snappy then it looks like they started
slowing down.
channel d3: starting piece 1 at 201810292132
channel d3: finished piece 1 at 201810292135
piece handle=+ACTSENTRY/sentry/datafile/u6tgtjra tag=SENTRY_ comment=NONE
channel d3: backup set complete, elapsed time: 00:02:45
channel d3: starting compressed incremental level 1 datafile backup set
channel d3: specifying datafile(s) in backup set
including current control file in backup set
channel d3: starting piece 1 at 201810292135
channel d3: finished piece 1 at 201810292136
piece handle=+ACTSENTRY/sentry/datafile/u7tgtk0h tag=SENTRY_ comment=NONE
channel d3: backup set complete, elapsed time: 00:00:25
channel d3: starting compressed incremental level 1 datafile backup set
channel d3: specifying datafile(s) in backup set
including current SPFILE in backup set
channel d3: starting piece 1 at 201810292136
channel d3: finished piece 1 at 201810292136
piece handle=+ACTSENTRY/sentry/datafile/u8tgtk23 tag=SENTRY_ comment=NONE
channel d3: backup set complete, elapsed time: 00:00:03
channel d2: finished piece 1 at 201810292137
piece handle=+ACTSENTRY/sentry/datafile/u2tgtj8t tag=SENTRY_ comment=NONE
channel d2: backup set complete, elapsed time: 00:15:00
channel d5: finished piece 1 at 201810292141
piece handle=+ACTSENTRY/sentry/datafile/u4tgtjcp tag=SENTRY_ comment=NONE
channel d5: backup set complete, elapsed time: 00:17:06
channel d4: finished piece 1 at 201810292145
piece handle=+ACTSENTRY/sentry/datafile/u3tgtjb0 tag=SENTRY_ comment=NONE
channel d4: backup set complete, elapsed time: 00:22:02
channel d1: finished piece 1 at 201810292155
piece handle=+ACTSENTRY/sentry/datafile/u5tgtjp6 tag=SENTRY_ comment=NONE
channel d1: backup set complete, elapsed time: 00:24:38
Background:
We're running a Level0 and a Level1 Merge of a 75+ TB database to an ASM
diskroup.
** Looking for comments/ideas/suggestions on whether this is 'normal' or if
its really slow? Seems like a problem to me but I'm not sure as I've never
done a Level1 Merge so maybe reading the SYSTEM datafile is supposed to be
slow for this step? **
The Level0 ran fine, the Level1 merge started and ran fine for hours.
Now all but one of my RMAN channels (6 total) are finished but one channel
is writing very slowly.
Tracing the problematic session shows below:
*** 2018-10-30 11:22:06.940
WAIT #0: nam='KSV master wait' ela= 384 indicator=10 p2=0 p3=0 obj#=-1
tim=804631999360
WAIT #0: nam='KSV master wait' ela= 48336 indicator=9 p2=0 p3=0 obj#=-1
tim=804632047895
WAIT #0: nam='ASM file metadata operation' ela= 443 msgop=41 locn=0 p3=0
obj#=-1 tim=804632048128
WAIT #0: nam='kfk: async disk IO' ela= 532 count=1 where=9
timeout=4294967295 obj#=-1 tim=804632048763
WAIT #0: nam='Disk file operations I/O' ela= 220 FileOperation=3 fileno=1
filetype=10 obj#=-1 tim=804632048803
*** 2018-10-30 11:22:16.073
WAIT #0: nam='resmgr:cpu quantum' ela= 6 location=3 consumer group id=4844
=0 obj#=-1 tim=804641132770
*** 2018-10-30 11:22:18.274
WAIT #0: nam='KSV master wait' ela= 398 indicator=10 p2=0 p3=0 obj#=-1
tim=804643333365
WAIT #0: nam='KSV master wait' ela= 49605 indicator=9 p2=0 p3=0 obj#=-1
tim=804643383128
WAIT #0: nam='ASM file metadata operation' ela= 369 msgop=41 locn=0 p3=0
obj#=-1 tim=804643383326
WAIT #0: nam='kfk: async disk IO' ela= 673 count=1 where=9
timeout=4294967295 obj#=-1 tim=804643384139
WAIT #0: nam='Disk file operations I/O' ela= 311 FileOperation=3 fileno=1
filetype=10 obj#=-1 tim=804643384215
*** 2018-10-30 11:22:29.644
WAIT #0: nam='KSV master wait' ela= 517 indicator=10 p2=0 p3=0 obj#=-1
tim=804654703615
WAIT #0: nam='KSV master wait' ela= 52485 indicator=9 p2=0 p3=0 obj#=-1
tim=804654756324
WAIT #0: nam='ASM file metadata operation' ela= 450 msgop=41 locn=0 p3=0
obj#=-1 tim=804654756539
WAIT #0: nam='kfk: async disk IO' ela= 527 count=1 where=9
timeout=4294967295 obj#=-1 tim=804654757294
WAIT #0: nam='Disk file operations I/O' ela= 375 FileOperation=3 fileno=1
filetype=10 obj#=-1 tim=804654757356
File# 1 is the system datafile which is size 32.99 GB.
Looking at the session from v$session shows:
MODULE CLIENT_INFO ACTION OSUSER EVENT WAIT_TIME_SECS WAIT_CLASS
backup incr datafile rman channel=d0 0000080 STARTED16 oracle Disk file
operations I/O 0.000005 User I/O
Strace shows:
io_submit(139813557071872, 1, {{data:0x7f28dcb09c48, pread, filedes:325,
buf:0x7f28dc36d000, nbytes:524288, offset:1734379831296}}) = 1
times(NULL) = 509955577
times(NULL) = 509955577
io_getevents(139813557071872, 4, 128, {{0x7f28dcb09ee8, 0x7f28dcb09ee8,
524288, 0}, {0x7f28dcb0a188, 0x7f28dcb0a188, 524288, 0}}, {0, 0}) = 2
times(NULL) = 509955577
getrusage(0x1 /* RUSAGE_??? */, {ru_utime={88371, 365510}, ru_stime={1383,
903614}, ...}) = 0
times(NULL) = 509955584
io_submit(139813557071872, 1, {{data:0x7f28dcb09ee8, pread, filedes:325,
buf:0x7f28dc2ec000, nbytes:524288, offset:1734380355584}}) = 1
times(NULL) = 509955584
times(NULL) = 509955584
times(NULL) = 509955584
times(NULL) = 509955584
io_submit(139813557071872, 1, {{data:0x7f28dcb0a188, pread, filedes:325,
buf:0x7f28dc04d000, nbytes:524288, offset:1734380879872}}) = 1
times(NULL) = 509955584
times(NULL) = 509955584
io_getevents(139813557071872, 4, 128, {{0x7f28dcb0a428, 0x7f28dcb0a428,
524288, 0}, {0x7f28dcb09c48, 0x7f28dcb09c48, 524288, 0}}, {0, 0}) = 2
times(NULL) = 509955584
So strace shows it streaming operations quickly but I can't figure out why
its taking so long.
The last operation to the rman log file was at 21:55 (9:55pm) last night.
Here's a sample of the RMAN output before it quit writing to the log - you
can see the write times were pretty snappy then it looks like they started
slowing down.
channel d3: starting piece 1 at 201810292132
channel d3: finished piece 1 at 201810292135
piece handle=+ACTSENTRY/sentry/datafile/u6tgtjra tag=SENTRY_ comment=NONE
channel d3: backup set complete, elapsed time: 00:02:45
channel d3: starting compressed incremental level 1 datafile backup set
channel d3: specifying datafile(s) in backup set
including current control file in backup set
channel d3: starting piece 1 at 201810292135
channel d3: finished piece 1 at 201810292136
piece handle=+ACTSENTRY/sentry/datafile/u7tgtk0h tag=SENTRY_ comment=NONE
channel d3: backup set complete, elapsed time: 00:00:25
channel d3: starting compressed incremental level 1 datafile backup set
channel d3: specifying datafile(s) in backup set
including current SPFILE in backup set
channel d3: starting piece 1 at 201810292136
channel d3: finished piece 1 at 201810292136
piece handle=+ACTSENTRY/sentry/datafile/u8tgtk23 tag=SENTRY_ comment=NONE
channel d3: backup set complete, elapsed time: 00:00:03
channel d2: finished piece 1 at 201810292137
piece handle=+ACTSENTRY/sentry/datafile/u2tgtj8t tag=SENTRY_ comment=NONE
channel d2: backup set complete, elapsed time: 00:15:00
channel d5: finished piece 1 at 201810292141
piece handle=+ACTSENTRY/sentry/datafile/u4tgtjcp tag=SENTRY_ comment=NONE
channel d5: backup set complete, elapsed time: 00:17:06
channel d4: finished piece 1 at 201810292145
piece handle=+ACTSENTRY/sentry/datafile/u3tgtjb0 tag=SENTRY_ comment=NONE
channel d4: backup set complete, elapsed time: 00:22:02
channel d1: finished piece 1 at 201810292155
piece handle=+ACTSENTRY/sentry/datafile/u5tgtjp6 tag=SENTRY_ comment=NONE
channel d1: backup set complete, elapsed time: 00:24:38