Discussion:
Instrumenting Poor performance
Upendra nerilla
2018-09-13 19:09:47 UTC
Permalink
Hello Team -
We have an EBS application which is running EBS and several other modules (OTC). From time to time I see a particular query giving poor response time. Normal response is <4 mins and during the poor execution, it goes over an hour.

We have a SQL baseline created for that SQL_ID and forcing the optimal plan. We are able to see that optimizer is using the same hash. There are no host resource (cpu/memory/io) constraints.. This job runs only a couple of times a day, very time sensitive for the business. We are unable to reproduce this at lower environments even when we performed refreshes..

We see a pattern that this issue shows up during the first week of the month and disappears after a couple of days (not the same exact day).
Here is what I am thinking of gathering - high level info..

1. Gather daily row count of all the tables involved in the query..
2. Review the jobs and see if there is anything unique..

What else could I gather to troubleshoot this further?
Appreciate any insights..

Thank you
-Upendra
Martin Berger
2018-09-13 19:52:01 UTC
Permalink
Hi Upendra,

is it always the same SQL_ID?
You an enable sql_trace for this particular SQL:
alter system set events 'sql_trace[sql: <SQL_ID>] level=12';

I assume EBS can enable tracing for specific activities, but I'm not close
to EBS, but maybe there are better solutions than simple sql_trace.

With the traces of different "good" and "bad" and see where the time is
spent.

Do you have proper facility to create a profile based on these sql_trace
files?

br,
berx



Am Do., 13. Sep. 2018 um 21:11 Uhr schrieb Upendra nerilla <
Post by Upendra nerilla
Hello Team -
We have an EBS application which is running EBS and several other modules
(OTC). From time to time I see a particular query giving poor response
time. Normal response is <4 mins and during the poor execution, it goes
over an hour.
We have a SQL baseline created for that SQL_ID and forcing the optimal
plan. We are able to see that optimizer is using the same hash. There
are no host resource (cpu/memory/io) constraints.. This job runs only a
couple of times a day, very time sensitive for the business. We are unable
to reproduce this at lower environments even when we performed refreshes..
We see a pattern that this issue shows up during the first week of the
month and disappears after a couple of days (not the same exact day).
Here is what I am thinking of gathering - high level info..
1. Gather daily row count of all the tables involved in the query..
2. Review the jobs and see if there is anything unique..
What else could I gather to troubleshoot this further?
Appreciate any insights..
Thank you
-Upendra
Upendra nerilla
2018-09-13 20:04:36 UTC
Permalink
is it always the same SQL_ID?
You an enable sql_trace for this particular SQL:
alter system set events 'sql_trace[sql: <SQL_ID>] level=12';

It is always the same sql_id. We have a 10046 and 10053 trace for the good execution.. Waiting for the next poor execution to capture the traces again..

I assume EBS can enable tracing for specific activities, but I'm not close to EBS, but maybe there are better solutions than simple sql_trace.

I am not an EBS guy, so I am at a loss here. :(

Do you have proper facility to create a profile based on these sql_trace files?

There is already a sql_profile created for the sql_id. Is that what you mean?


________________________________
From: Martin Berger <***@gmail.com>
Sent: Thursday, September 13, 2018 3:52 PM
To: ***@hotmail.com
Cc: Oracle-L oracle-l
Subject: Re: Instrumenting Poor performance

Hi Upendra,

is it always the same SQL_ID?
You an enable sql_trace for this particular SQL:
alter system set events 'sql_trace[sql: <SQL_ID>] level=12';


I assume EBS can enable tracing for specific activities, but I'm not close to EBS, but maybe there are better solutions than simple sql_trace.

With the traces of different "good" and "bad" and see where the time is spent.

Do you have proper facility to create a profile based on these sql_trace files?

br,
berx



Am Do., 13. Sep. 2018 um 21:11 Uhr schrieb Upendra nerilla <***@hotmail.com<mailto:***@hotmail.com>>:
Hello Team -
We have an EBS application which is running EBS and several other modules (OTC). From time to time I see a particular query giving poor response time. Normal response is <4 mins and during the poor execution, it goes over an hour.

We have a SQL baseline created for that SQL_ID and forcing the optimal plan. We are able to see that optimizer is using the same hash. There are no host resource (cpu/memory/io) constraints.. This job runs only a couple of times a day, very time sensitive for the business. We are unable to reproduce this at lower environments even when we performed refreshes..

We see a pattern that this issue shows up during the first week of the month and disappears after a couple of days (not the same exact day).
Here is what I am thinking of gathering - high level info..

1. Gather daily row count of all the tables involved in the query..
2. Review the jobs and see if there is anything unique..

What else could I gather to troubleshoot this further?
Appreciate any insights..

Thank you
-Upendra
Martin Berger
2018-09-13 20:37:18 UTC
Permalink
to compile sql_trace files, you can use tools like
tkprof
orasrp <http://www.oracledba.ru/orasrp/>
TVD$XTAT <https://antognini.ch/2008/10/introduce-tvdxtat/>
Method-R Profiler <https://method-r.com/software/workbench/>
there are others also.
This should give you a good starting point where and why time is spent
sometimes.

hth,
berx

Am Do., 13. Sep. 2018 um 22:04 Uhr schrieb Upendra nerilla <
Post by Martin Berger
is it always the same SQL_ID?
alter system set events 'sql_trace[sql: <SQL_ID>] level=12';
It is always the same sql_id. We have a 10046 and 10053 trace for the good
execution.. Waiting for the next poor execution to capture the traces
again..
I assume EBS can enable tracing for specific activities, but I'm not close
to EBS, but maybe there are better solutions than simple sql_trace.
I am not an EBS guy, so I am at a loss here. :(
Do you have proper facility to create a profile based on these sql_trace files?
There is already a sql_profile created for the sql_id. Is that what you mean?
------------------------------
*Sent:* Thursday, September 13, 2018 3:52 PM
*Cc:* Oracle-L oracle-l
*Subject:* Re: Instrumenting Poor performance
Hi Upendra,
is it always the same SQL_ID?
alter system set events 'sql_trace[sql: <SQL_ID>] level=12';
I assume EBS can enable tracing for specific activities, but I'm not close
to EBS, but maybe there are better solutions than simple sql_trace.
With the traces of different "good" and "bad" and see where the time is spent.
Do you have proper facility to create a profile based on these sql_trace files?
br,
berx
Am Do., 13. Sep. 2018 um 21:11 Uhr schrieb Upendra nerilla <
Hello Team -
We have an EBS application which is running EBS and several other modules
(OTC). From time to time I see a particular query giving poor response
time. Normal response is <4 mins and during the poor execution, it goes
over an hour.
We have a SQL baseline created for that SQL_ID and forcing the optimal
plan. We are able to see that optimizer is using the same hash. There
are no host resource (cpu/memory/io) constraints.. This job runs only a
couple of times a day, very time sensitive for the business. We are unable
to reproduce this at lower environments even when we performed refreshes..
We see a pattern that this issue shows up during the first week of the
month and disappears after a couple of days (not the same exact day).
Here is what I am thinking of gathering - high level info..
1. Gather daily row count of all the tables involved in the query..
2. Review the jobs and see if there is anything unique..
What else could I gather to troubleshoot this further?
Appreciate any insights..
Thank you
-Upendra
Upendra nerilla
2018-09-14 02:54:59 UTC
Permalink
Thanks Martin. That helps..
I have used orasrp and TVD$XTAT, both are great.
I can generate the trace only when the performance issue pops up in a couple of weeks..
Wondering if there is anything else that I could look into while waiting?



________________________________
From: Martin Berger <***@gmail.com>
Sent: Thursday, September 13, 2018 4:37 PM
To: ***@hotmail.com
Cc: Oracle-L oracle-l
Subject: Re: Instrumenting Poor performance

to compile sql_trace files, you can use tools like
tkprof
orasrp<http://www.oracledba.ru/orasrp/>
TVD$XTAT<https://antognini.ch/2008/10/introduce-tvdxtat/>
Method-R Profiler<https://method-r.com/software/workbench/>
there are others also.
This should give you a good starting point where and why time is spent sometimes.

hth,
berx

Am Do., 13. Sep. 2018 um 22:04 Uhr schrieb Upendra nerilla <***@hotmail.com<mailto:***@hotmail.com>>:
is it always the same SQL_ID?
You an enable sql_trace for this particular SQL:
alter system set events 'sql_trace[sql: <SQL_ID>] level=12';

It is always the same sql_id. We have a 10046 and 10053 trace for the good execution.. Waiting for the next poor execution to capture the traces again..

I assume EBS can enable tracing for specific activities, but I'm not close to EBS, but maybe there are better solutions than simple sql_trace.

I am not an EBS guy, so I am at a loss here. :(

Do you have proper facility to create a profile based on these sql_trace files?

There is already a sql_profile created for the sql_id. Is that what you mean?


________________________________
From: Martin Berger <***@gmail.com<mailto:***@gmail.com>>
Sent: Thursday, September 13, 2018 3:52 PM
To: ***@hotmail.com<mailto:***@hotmail.com>
Cc: Oracle-L oracle-l
Subject: Re: Instrumenting Poor performance

Hi Upendra,

is it always the same SQL_ID?
You an enable sql_trace for this particular SQL:
alter system set events 'sql_trace[sql: <SQL_ID>] level=12';


I assume EBS can enable tracing for specific activities, but I'm not close to EBS, but maybe there are better solutions than simple sql_trace.

With the traces of different "good" and "bad" and see where the time is spent.

Do you have proper facility to create a profile based on these sql_trace files?

br,
berx



Am Do., 13. Sep. 2018 um 21:11 Uhr schrieb Upendra nerilla <***@hotmail.com<mailto:***@hotmail.com>>:
Hello Team -
We have an EBS application which is running EBS and several other modules (OTC). From time to time I see a particular query giving poor response time. Normal response is <4 mins and during the poor execution, it goes over an hour.

We have a SQL baseline created for that SQL_ID and forcing the optimal plan. We are able to see that optimizer is using the same hash. There are no host resource (cpu/memory/io) constraints.. This job runs only a couple of times a day, very time sensitive for the business. We are unable to reproduce this at lower environments even when we performed refreshes..

We see a pattern that this issue shows up during the first week of the month and disappears after a couple of days (not the same exact day).
Here is what I am thinking of gathering - high level info..

1. Gather daily row count of all the tables involved in the query..
2. Review the jobs and see if there is anything unique..

What else could I gather to troubleshoot this further?
Appreciate any insights..

Thank you
-Upendra
Mark Burgess
2018-09-14 05:19:58 UTC
Permalink
Upendra,

if you have MOS access take a look through note # 290614.1.

If you are only seeing this at specific times of the month check the frequency of the stats collection jobs - note that you must use the ‘Gather% Statistics’ concurrent programs to collect stats and not dbms_stats.

Reporting in EBS is typically based on period - and date. Possible that you just need to refresh stats more regularly around month end processing to ensure that the stats reflect the current month and are not stale.

Regards,

Mark
Post by Upendra nerilla
Thanks Martin. That helps..
I have used orasrp and TVD$XTAT, both are great.
I can generate the trace only when the performance issue pops up in a couple of weeks..
Wondering if there is anything else that I could look into while waiting?
Sent: Thursday, September 13, 2018 4:37 PM
Cc: Oracle-L oracle-l
Subject: Re: Instrumenting Poor performance
to compile sql_trace files, you can use tools like
tkprof
orasrp <http://www.oracledba.ru/orasrp/>
TVD$XTAT <https://antognini.ch/2008/10/introduce-tvdxtat/>
Method-R Profiler <https://method-r.com/software/workbench/>
there are others also.
This should give you a good starting point where and why time is spent sometimes.
hth,
berx
is it always the same SQL_ID?
alter system set events 'sql_trace[sql: <SQL_ID>] level=12';
It is always the same sql_id. We have a 10046 and 10053 trace for the good execution.. Waiting for the next poor execution to capture the traces again..
I assume EBS can enable tracing for specific activities, but I'm not close to EBS, but maybe there are better solutions than simple sql_trace.
I am not an EBS guy, so I am at a loss here. :(
Do you have proper facility to create a profile based on these sql_trace files?
There is already a sql_profile created for the sql_id. Is that what you mean?
Sent: Thursday, September 13, 2018 3:52 PM
Cc: Oracle-L oracle-l
Subject: Re: Instrumenting Poor performance
Hi Upendra,
is it always the same SQL_ID?
alter system set events 'sql_trace[sql: <SQL_ID>] level=12';
I assume EBS can enable tracing for specific activities, but I'm not close to EBS, but maybe there are better solutions than simple sql_trace.
With the traces of different "good" and "bad" and see where the time is spent.
Do you have proper facility to create a profile based on these sql_trace files?
br,
berx
Hello Team -
We have an EBS application which is running EBS and several other modules (OTC). From time to time I see a particular query giving poor response time. Normal response is <4 mins and during the poor execution, it goes over an hour.
We have a SQL baseline created for that SQL_ID and forcing the optimal plan. We are able to see that optimizer is using the same hash. There are no host resource (cpu/memory/io) constraints.. This job runs only a couple of times a day, very time sensitive for the business. We are unable to reproduce this at lower environments even when we performed refreshes..
We see a pattern that this issue shows up during the first week of the month and disappears after a couple of days (not the same exact day).
Here is what I am thinking of gathering - high level info..
Gather daily row count of all the tables involved in the query..
Review the jobs and see if there is anything unique..
What else could I gather to troubleshoot this further?
Appreciate any insights..
Thank you
-Upendra
Upendra nerilla
2018-09-14 19:49:57 UTC
Permalink
Mark,
Thanks for sharing the note. I will look into it and share with the team to see if we could enable traces with this specific scenario.

We do collect stats using the EBS specific syntax..

We were gathering stats once a week, but after this issue, after the first occurrence of this issue we started gathering stats 3 times a week..

-Upendra

________________________________
From: Mark Burgess <***@burgess-consulting.com.au>
Sent: Friday, September 14, 2018 1:19 AM
To: ***@hotmail.com
Cc: Oracle List List
Subject: Re: Instrumenting Poor performance

Upendra,

if you have MOS access take a look through note # 290614.1.

If you are only seeing this at specific times of the month check the frequency of the stats collection jobs - note that you must use the ‘Gather% Statistics’ concurrent programs to collect stats and not dbms_stats.

Reporting in EBS is typically based on period - and date. Possible that you just need to refresh stats more regularly around month end processing to ensure that the stats reflect the current month and are not stale.

Regards,

Mark


On 14 Sep 2018, at 12:54 pm, Upendra nerilla <***@hotmail.com<mailto:***@hotmail.com>> wrote:

Thanks Martin. That helps..
I have used orasrp and TVD$XTAT, both are great.
I can generate the trace only when the performance issue pops up in a couple of weeks..
Wondering if there is anything else that I could look into while waiting?



________________________________
From: Martin Berger <***@gmail.com<mailto:***@gmail.com>>
Sent: Thursday, September 13, 2018 4:37 PM
To: ***@hotmail.com<mailto:***@hotmail.com>
Cc: Oracle-L oracle-l
Subject: Re: Instrumenting Poor performance

to compile sql_trace files, you can use tools like
tkprof
orasrp<http://www.oracledba.ru/orasrp/>
TVD$XTAT<https://antognini.ch/2008/10/introduce-tvdxtat/>
Method-R Profiler<https://method-r.com/software/workbench/>
there are others also.
This should give you a good starting point where and why time is spent sometimes.

hth,
berx

Am Do., 13. Sep. 2018 um 22:04 Uhr schrieb Upendra nerilla <***@hotmail.com<mailto:***@hotmail.com>>:
is it always the same SQL_ID?
You an enable sql_trace for this particular SQL:
alter system set events 'sql_trace[sql: <SQL_ID>] level=12';

It is always the same sql_id. We have a 10046 and 10053 trace for the good execution.. Waiting for the next poor execution to capture the traces again..

I assume EBS can enable tracing for specific activities, but I'm not close to EBS, but maybe there are better solutions than simple sql_trace.

I am not an EBS guy, so I am at a loss here. :(

Do you have proper facility to create a profile based on these sql_trace files?

There is already a sql_profile created for the sql_id. Is that what you mean?


________________________________
From: Martin Berger <***@gmail.com<mailto:***@gmail.com>>
Sent: Thursday, September 13, 2018 3:52 PM
To: ***@hotmail.com<mailto:***@hotmail.com>
Cc: Oracle-L oracle-l
Subject: Re: Instrumenting Poor performance

Hi Upendra,

is it always the same SQL_ID?
You an enable sql_trace for this particular SQL:
alter system set events 'sql_trace[sql: <SQL_ID>] level=12';


I assume EBS can enable tracing for specific activities, but I'm not close to EBS, but maybe there are better solutions than simple sql_trace.

With the traces of different "good" and "bad" and see where the time is spent.

Do you have proper facility to create a profile based on these sql_trace files?

br,
berx



Am Do., 13. Sep. 2018 um 21:11 Uhr schrieb Upendra nerilla <***@hotmail.com<mailto:***@hotmail.com>>:
Hello Team -
We have an EBS application which is running EBS and several other modules (OTC). From time to time I see a particular query giving poor response time. Normal response is <4 mins and during the poor execution, it goes over an hour.

We have a SQL baseline created for that SQL_ID and forcing the optimal plan. We are able to see that optimizer is using the same hash. There are no host resource (cpu/memory/io) constraints.. This job runs only a couple of times a day, very time sensitive for the business. We are unable to reproduce this at lower environments even when we performed refreshes..

We see a pattern that this issue shows up during the first week of the month and disappears after a couple of days (not the same exact day).
Here is what I am thinking of gathering - high level info..

1. Gather daily row count of all the tables involved in the query..
2. Review the jobs and see if there is anything unique..

What else could I gather to troubleshoot this further?
Appreciate any insights..

Thank you
-Upendra
Stefan Koehler
2018-09-14 10:37:46 UTC
Permalink
Hello Upendra,
sure, if you have licensed Oracle's diagnostics/tuning pack.

You can run SQLd360 (https://github.com/sqldb360/sqldb360) for that particular SQL ID and then have a look at the "34. Plans Details" page (if historical SQL Monitoring Reports are not available anymore). It got a section called "Top Executions from history" and "Top 15 Wait events over historical time for PHV ....." which helps you to understand what is going on :)

Of course ASH data is not enough if it is some kind of fancy stuff but give it a try to spot something very obvious :-)

Best Regards
Stefan Koehler

Independent Oracle performance consultant and researcher
Website: http://www.soocs.de
Thanks Martin. That helps.. 
I have used orasrp and TVD$XTAT, both are great.
I can generate the trace only when the performance issue pops up in a couple of weeks.. 
Wondering if there is anything else that I could look into while waiting?
--
http://www.freelists.org/webpage/oracle-l
Mark W. Farnham
2018-09-15 11:37:04 UTC
Permalink
If this is a concurrent manager job pair it up scheduling another to start
first such that the new job just queries the ccmgr tables to see if the
original job is finished. If the original job hasn't started, wait and check
again (perhaps a minute in this case). If the original job finishes, have
the new job quit. If the original job is running for more than the amount of
time (current clock minus started > 4 minutes in this case if I understand
you correctly) then turn on trace for the original job, perhaps also query a
few counts on the component tables and the stats of the tables and then
quit.



I'm presuming here that this is NOT part of a month end pile up on jobs
starting in the first place, for example due to queue limits. Or someone
bumping queue limits unrealistically high so lots of jobs are running slowly
because the system is overloaded. I suppose the new job would count how many
times it waited for the original job to start. Just make sure the new job
starts before the original job.



mwf



From: oracle-l-***@freelists.org [mailto:oracle-l-***@freelists.org]
On Behalf Of Upendra nerilla
Sent: Thursday, September 13, 2018 10:55 PM
To: Martin Berger
Cc: Oracle-L oracle-l
Subject: Re: Instrumenting Poor performance



Thanks Martin. That helps..

I have used orasrp and TVD$XTAT, both are great.

I can generate the trace only when the performance issue pops up in a couple
of weeks..

Wondering if there is anything else that I could look into while waiting?







_____

From: Martin Berger <***@gmail.com>
Sent: Thursday, September 13, 2018 4:37 PM
To: ***@hotmail.com
Cc: Oracle-L oracle-l
Subject: Re: Instrumenting Poor performance



to compile sql_trace files, you can use tools like

tkprof

orasrp <http://www.oracledba.ru/orasrp/>

TVD$XTAT <https://antognini.ch/2008/10/introduce-tvdxtat/>

Method-R Profiler <https://method-r.com/software/workbench/>

there are others also.

This should give you a good starting point where and why time is spent
sometimes.



hth,

berx



Am Do., 13. Sep. 2018 um 22:04 Uhr schrieb Upendra nerilla
<***@hotmail.com>:

is it always the same SQL_ID?

You an enable sql_trace for this particular SQL:

alter system set events 'sql_trace[sql: <SQL_ID>] level=12';



It is always the same sql_id. We have a 10046 and 10053 trace for the good
execution.. Waiting for the next poor execution to capture the traces
again..


I assume EBS can enable tracing for specific activities, but I'm not close
to EBS, but maybe there are better solutions than simple sql_trace.





I am not an EBS guy, so I am at a loss here. :(



Do you have proper facility to create a profile based on these sql_trace
files?



There is already a sql_profile created for the sql_id. Is that what you
mean?





_____

From: Martin Berger <***@gmail.com>
Sent: Thursday, September 13, 2018 3:52 PM
To: ***@hotmail.com
Cc: Oracle-L oracle-l
Subject: Re: Instrumenting Poor performance



Hi Upendra,



is it always the same SQL_ID?

You an enable sql_trace for this particular SQL:

alter system set events 'sql_trace[sql: <SQL_ID>] level=12';





I assume EBS can enable tracing for specific activities, but I'm not close
to EBS, but maybe there are better solutions than simple sql_trace.



With the traces of different "good" and "bad" and see where the time is
spent.



Do you have proper facility to create a profile based on these sql_trace
files?



br,

berx







Am Do., 13. Sep. 2018 um 21:11 Uhr schrieb Upendra nerilla
<***@hotmail.com>:

Hello Team -

We have an EBS application which is running EBS and several other modules
(OTC). From time to time I see a particular query giving poor response time.
Normal response is <4 mins and during the poor execution, it goes over an
hour.

We have a SQL baseline created for that SQL_ID and forcing the optimal plan.
We are able to see that optimizer is using the same hash. There are no host
resource (cpu/memory/io) constraints.. This job runs only a couple of times
a day, very time sensitive for the business. We are unable to reproduce
this at lower environments even when we performed refreshes..



We see a pattern that this issue shows up during the first week of the month
and disappears after a couple of days (not the same exact day).

Here is what I am thinking of gathering - high level info..

1. Gather daily row count of all the tables involved in the query..
2. Review the jobs and see if there is anything unique..

What else could I gather to troubleshoot this further?

Appreciate any insights..



Thank you

-Upendra
Glauber, Leo
2018-09-17 11:53:41 UTC
Permalink
Do you have the ability to restore the production database to a non-prod environment for a timeframe when this occurs?
-Leo

From: oracle-l-***@freelists.org <oracle-l-***@freelists.org> On Behalf Of Mark W. Farnham
Sent: Saturday, September 15, 2018 7:37 AM
To: ***@hotmail.com; 'Martin Berger' <***@gmail.com>
Cc: 'Oracle-L oracle-l' <oracle-***@freelists.org>
Subject: RE: Instrumenting Poor performance

If this is a concurrent manager job pair it up scheduling another to start first such that the new job just queries the ccmgr tables to see if the original job is finished. If the original job hasn't started, wait and check again (perhaps a minute in this case). If the original job finishes, have the new job quit. If the original job is running for more than the amount of time (current clock minus started > 4 minutes in this case if I understand you correctly) then turn on trace for the original job, perhaps also query a few counts on the component tables and the stats of the tables and then quit.

I'm presuming here that this is NOT part of a month end pile up on jobs starting in the first place, for example due to queue limits. Or someone bumping queue limits unrealistically high so lots of jobs are running slowly because the system is overloaded. I suppose the new job would count how many times it waited for the original job to start. Just make sure the new job starts before the original job.

mwf

From: oracle-l-***@freelists.org<mailto:oracle-l-***@freelists.org> [mailto:oracle-l-***@freelists.org] On Behalf Of Upendra nerilla
Sent: Thursday, September 13, 2018 10:55 PM
To: Martin Berger
Cc: Oracle-L oracle-l
Subject: Re: Instrumenting Poor performance

Thanks Martin. That helps..
I have used orasrp and TVD$XTAT, both are great.
I can generate the trace only when the performance issue pops up in a couple of weeks..
Wondering if there is anything else that I could look into while waiting?



________________________________
From: Martin Berger <***@gmail.com<mailto:***@gmail.com>>
Sent: Thursday, September 13, 2018 4:37 PM
To: ***@hotmail.com<mailto:***@hotmail.com>
Cc: Oracle-L oracle-l
Subject: Re: Instrumenting Poor performance

to compile sql_trace files, you can use tools like
tkprof
orasrp<http://www.oracledba.ru/orasrp/>
TVD$XTAT<https://antognini.ch/2008/10/introduce-tvdxtat/>
Method-R Profiler<https://method-r.com/software/workbench/>
there are others also.
This should give you a good starting point where and why time is spent sometimes.

hth,
berx

Am Do., 13. Sep. 2018 um 22:04 Uhr schrieb Upendra nerilla <***@hotmail.com<mailto:***@hotmail.com>>:
is it always the same SQL_ID?
You an enable sql_trace for this particular SQL:
alter system set events 'sql_trace[sql: <SQL_ID>] level=12';

It is always the same sql_id. We have a 10046 and 10053 trace for the good execution.. Waiting for the next poor execution to capture the traces again..

I assume EBS can enable tracing for specific activities, but I'm not close to EBS, but maybe there are better solutions than simple sql_trace.

I am not an EBS guy, so I am at a loss here. :(

Do you have proper facility to create a profile based on these sql_trace files?

There is already a sql_profile created for the sql_id. Is that what you mean?


________________________________
From: Martin Berger <***@gmail.com<mailto:***@gmail.com>>
Sent: Thursday, September 13, 2018 3:52 PM
To: ***@hotmail.com<mailto:***@hotmail.com>
Cc: Oracle-L oracle-l
Subject: Re: Instrumenting Poor performance

Hi Upendra,

is it always the same SQL_ID?
You an enable sql_trace for this particular SQL:
alter system set events 'sql_trace[sql: <SQL_ID>] level=12';


I assume EBS can enable tracing for specific activities, but I'm not close to EBS, but maybe there are better solutions than simple sql_trace.

With the traces of different "good" and "bad" and see where the time is spent.

Do you have proper facility to create a profile based on these sql_trace files?

br,
berx



Am Do., 13. Sep. 2018 um 21:11 Uhr schrieb Upendra nerilla <***@hotmail.com<mailto:***@hotmail.com>>:
Hello Team -
We have an EBS application which is running EBS and several other modules (OTC). From time to time I see a particular query giving poor response time. Normal response is <4 mins and during the poor execution, it goes over an hour.

We have a SQL baseline created for that SQL_ID and forcing the optimal plan. We are able to see that optimizer is using the same hash. There are no host resource (cpu/memory/io) constraints.. This job runs only a couple of times a day, very time sensitive for the business. We are unable to reproduce this at lower environments even when we performed refreshes..

We see a pattern that this issue shows up during the first week of the month and disappears after a couple of days (not the same exact day).
Here is what I am thinking of gathering - high level info..

1. Gather daily row count of all the tables involved in the query..
2. Review the jobs and see if there is anything unique..
What else could I gather to troubleshoot this further?
Appreciate any insights..

Thank you
-Upendra

________________________________

This e-mail, attachments included, is confidential. It is intended solely for the addressees. If you are not an intended recipient, any use, copy or diffusion, even partial of this message is prohibited. Please delete it and notify the sender immediately. Since the integrity of this message cannot be guaranteed on the Internet, SODEXO cannot therefore be considered liable for its content.

Ce message, pieces jointes incluses, est confidentiel. Il est etabli a l'attention exclusive de ses destinataires. Si vous n'etes pas un destinataire, toute utilisation, copie ou diffusion, meme partielle de ce message est interdite. Merci de le detruire et d'en avertir immediatement l'expediteur. L'integrite de ce message ne pouvant etre garantie sur Internet, SODEXO ne peut etre tenu responsable de son contenu.
Upendra nerilla
2018-09-18 01:21:45 UTC
Permalink
We had tried to restore the database a few minutes prior to the issue occurring to a lower environment, we were unable to reproduce it. Today we are going to go through another refresh right before this query runs, hopefully we'll catch it in-time..

Thanks

________________________________
From: Glauber, Leo <***@sodexo.com>
Sent: Monday, September 17, 2018 7:53 AM
To: ***@rsiz.com; ***@hotmail.com; 'Martin Berger'
Cc: 'Oracle-L oracle-l'
Subject: RE: Instrumenting Poor performance


Do you have the ability to restore the production database to a non-prod environment for a timeframe when this occurs?

-Leo



From: oracle-l-***@freelists.org <oracle-l-***@freelists.org> On Behalf Of Mark W. Farnham
Sent: Saturday, September 15, 2018 7:37 AM
To: ***@hotmail.com; 'Martin Berger' <***@gmail.com>
Cc: 'Oracle-L oracle-l' <oracle-***@freelists.org>
Subject: RE: Instrumenting Poor performance



If this is a concurrent manager job pair it up scheduling another to start first such that the new job just queries the ccmgr tables to see if the original job is finished. If the original job hasn’t started, wait and check again (perhaps a minute in this case). If the original job finishes, have the new job quit. If the original job is running for more than the amount of time (current clock minus started > 4 minutes in this case if I understand you correctly) then turn on trace for the original job, perhaps also query a few counts on the component tables and the stats of the tables and then quit.



I’m presuming here that this is NOT part of a month end pile up on jobs starting in the first place, for example due to queue limits. Or someone bumping queue limits unrealistically high so lots of jobs are running slowly because the system is overloaded. I suppose the new job would count how many times it waited for the original job to start. Just make sure the new job starts before the original job.



mwf



From: oracle-l-***@freelists.org<mailto:oracle-l-***@freelists.org> [mailto:oracle-l-***@freelists.org] On Behalf Of Upendra nerilla
Sent: Thursday, September 13, 2018 10:55 PM
To: Martin Berger
Cc: Oracle-L oracle-l
Subject: Re: Instrumenting Poor performance



Thanks Martin. That helps..

I have used orasrp and TVD$XTAT, both are great.

I can generate the trace only when the performance issue pops up in a couple of weeks..

Wondering if there is anything else that I could look into while waiting?







________________________________

From: Martin Berger <***@gmail.com<mailto:***@gmail.com>>
Sent: Thursday, September 13, 2018 4:37 PM
To: ***@hotmail.com<mailto:***@hotmail.com>
Cc: Oracle-L oracle-l
Subject: Re: Instrumenting Poor performance



to compile sql_trace files, you can use tools like

tkprof

orasrp<http://www.oracledba.ru/orasrp/>

TVD$XTAT<https://antognini.ch/2008/10/introduce-tvdxtat/>

Method-R Profiler<https://method-r.com/software/workbench/>

there are others also.

This should give you a good starting point where and why time is spent sometimes.



hth,

berx



Am Do., 13. Sep. 2018 um 22:04 Uhr schrieb Upendra nerilla <***@hotmail.com<mailto:***@hotmail.com>>:

is it always the same SQL_ID?

You an enable sql_trace for this particular SQL:

alter system set events 'sql_trace[sql: <SQL_ID>] level=12';



It is always the same sql_id. We have a 10046 and 10053 trace for the good execution.. Waiting for the next poor execution to capture the traces again..

I assume EBS can enable tracing for specific activities, but I'm not close to EBS, but maybe there are better solutions than simple sql_trace.



I am not an EBS guy, so I am at a loss here. :(



Do you have proper facility to create a profile based on these sql_trace files?



There is already a sql_profile created for the sql_id. Is that what you mean?





________________________________

From: Martin Berger <***@gmail.com<mailto:***@gmail.com>>
Sent: Thursday, September 13, 2018 3:52 PM
To: ***@hotmail.com<mailto:***@hotmail.com>
Cc: Oracle-L oracle-l
Subject: Re: Instrumenting Poor performance



Hi Upendra,



is it always the same SQL_ID?

You an enable sql_trace for this particular SQL:

alter system set events 'sql_trace[sql: <SQL_ID>] level=12';





I assume EBS can enable tracing for specific activities, but I'm not close to EBS, but maybe there are better solutions than simple sql_trace.



With the traces of different "good" and "bad" and see where the time is spent.



Do you have proper facility to create a profile based on these sql_trace files?



br,

berx







Am Do., 13. Sep. 2018 um 21:11 Uhr schrieb Upendra nerilla <***@hotmail.com<mailto:***@hotmail.com>>:

Hello Team -

We have an EBS application which is running EBS and several other modules (OTC). From time to time I see a particular query giving poor response time. Normal response is <4 mins and during the poor execution, it goes over an hour.

We have a SQL baseline created for that SQL_ID and forcing the optimal plan. We are able to see that optimizer is using the same hash. There are no host resource (cpu/memory/io) constraints.. This job runs only a couple of times a day, very time sensitive for the business. We are unable to reproduce this at lower environments even when we performed refreshes..



We see a pattern that this issue shows up during the first week of the month and disappears after a couple of days (not the same exact day).

Here is what I am thinking of gathering - high level info..

1. Gather daily row count of all the tables involved in the query..
2. Review the jobs and see if there is anything unique..

What else could I gather to troubleshoot this further?

Appreciate any insights..



Thank you

-Upendra

________________________________

This e-mail, attachments included, is confidential. It is intended solely for the addressees. If you are not an intended recipient, any use, copy or diffusion, even partial of this message is prohibited. Please delete it and notify the sender immediately. Since the integrity of this message cannot be guaranteed on the Internet, SODEXO cannot therefore be considered liable for its content.

Ce message, pieces jointes incluses, est confidentiel. Il est etabli a l'attention exclusive de ses destinataires. Si vous n'etes pas un destinataire, toute utilisation, copie ou diffusion, meme partielle de ce message est interdite. Merci de le detruire et d'en avertir immediatement l'expediteur. L'integrite de ce message ne pouvant etre garantie sur Internet, SODEXO ne peut etre tenu responsable de son contenu.
Upendra nerilla
2018-09-18 00:33:42 UTC
Permalink
Thanks Mark for the suggestion..

It is possible that there may be other jobs impacted by concurrent manager queue.. but In this scenario, we are looking purely at a SQL execution bypassing the concurrent manager etc..



________________________________
From: Mark W. Farnham <***@rsiz.com>
Sent: Saturday, September 15, 2018 7:37 AM
To: ***@hotmail.com; 'Martin Berger'
Cc: 'Oracle-L oracle-l'
Subject: RE: Instrumenting Poor performance


If this is a concurrent manager job pair it up scheduling another to start first such that the new job just queries the ccmgr tables to see if the original job is finished. If the original job hasn’t started, wait and check again (perhaps a minute in this case). If the original job finishes, have the new job quit. If the original job is running for more than the amount of time (current clock minus started > 4 minutes in this case if I understand you correctly) then turn on trace for the original job, perhaps also query a few counts on the component tables and the stats of the tables and then quit.



I’m presuming here that this is NOT part of a month end pile up on jobs starting in the first place, for example due to queue limits. Or someone bumping queue limits unrealistically high so lots of jobs are running slowly because the system is overloaded. I suppose the new job would count how many times it waited for the original job to start. Just make sure the new job starts before the original job.



mwf



From: oracle-l-***@freelists.org [mailto:oracle-l-***@freelists.org] On Behalf Of Upendra nerilla
Sent: Thursday, September 13, 2018 10:55 PM
To: Martin Berger
Cc: Oracle-L oracle-l
Subject: Re: Instrumenting Poor performance



Thanks Martin. That helps..

I have used orasrp and TVD$XTAT, both are great.

I can generate the trace only when the performance issue pops up in a couple of weeks..

Wondering if there is anything else that I could look into while waiting?







________________________________

From: Martin Berger <***@gmail.com<mailto:***@gmail.com>>
Sent: Thursday, September 13, 2018 4:37 PM
To: ***@hotmail.com<mailto:***@hotmail.com>
Cc: Oracle-L oracle-l
Subject: Re: Instrumenting Poor performance



to compile sql_trace files, you can use tools like

tkprof

orasrp<http://www.oracledba.ru/orasrp/>

TVD$XTAT<https://antognini.ch/2008/10/introduce-tvdxtat/>

Method-R Profiler<https://method-r.com/software/workbench/>

there are others also.

This should give you a good starting point where and why time is spent sometimes.



hth,

berx



Am Do., 13. Sep. 2018 um 22:04 Uhr schrieb Upendra nerilla <***@hotmail.com<mailto:***@hotmail.com>>:

is it always the same SQL_ID?

You an enable sql_trace for this particular SQL:

alter system set events 'sql_trace[sql: <SQL_ID>] level=12';



It is always the same sql_id. We have a 10046 and 10053 trace for the good execution.. Waiting for the next poor execution to capture the traces again..

I assume EBS can enable tracing for specific activities, but I'm not close to EBS, but maybe there are better solutions than simple sql_trace.



I am not an EBS guy, so I am at a loss here. :(



Do you have proper facility to create a profile based on these sql_trace files?



There is already a sql_profile created for the sql_id. Is that what you mean?





________________________________

From: Martin Berger <***@gmail.com<mailto:***@gmail.com>>
Sent: Thursday, September 13, 2018 3:52 PM
To: ***@hotmail.com<mailto:***@hotmail.com>
Cc: Oracle-L oracle-l
Subject: Re: Instrumenting Poor performance



Hi Upendra,



is it always the same SQL_ID?

You an enable sql_trace for this particular SQL:

alter system set events 'sql_trace[sql: <SQL_ID>] level=12';





I assume EBS can enable tracing for specific activities, but I'm not close to EBS, but maybe there are better solutions than simple sql_trace.



With the traces of different "good" and "bad" and see where the time is spent.



Do you have proper facility to create a profile based on these sql_trace files?



br,

berx







Am Do., 13. Sep. 2018 um 21:11 Uhr schrieb Upendra nerilla <***@hotmail.com<mailto:***@hotmail.com>>:

Hello Team -

We have an EBS application which is running EBS and several other modules (OTC). From time to time I see a particular query giving poor response time. Normal response is <4 mins and during the poor execution, it goes over an hour.

We have a SQL baseline created for that SQL_ID and forcing the optimal plan. We are able to see that optimizer is using the same hash. There are no host resource (cpu/memory/io) constraints.. This job runs only a couple of times a day, very time sensitive for the business. We are unable to reproduce this at lower environments even when we performed refreshes..



We see a pattern that this issue shows up during the first week of the month and disappears after a couple of days (not the same exact day).

Here is what I am thinking of gathering - high level info..

1. Gather daily row count of all the tables involved in the query..
2. Review the jobs and see if there is anything unique..

What else could I gather to troubleshoot this further?

Appreciate any insights..



Thank you

-Upendra
Upendra nerilla
2018-09-17 21:32:44 UTC
Permalink
Jonathan,
Thank you so much for pointing out the potentially conflicting update.. I do see a large number of "db_sequential_read" for a single table which might be the root cause.. I will look into identifying which other process may be the culprit updating that table..
I am going through the saved AWR baseline (during the poor execution) and see if I could find anything..

-Upendra

________________________________
From: Jonathan Lewis <***@jlcomp.demon.co.uk>
Sent: Friday, September 14, 2018 3:28 AM
To: Martin Berger; ***@hotmail.com
Cc: Oracle-L oracle-l
Subject: Re: Instrumenting Poor performance


Are you licensed to for the diagnostic and performance packs.
If not are you running statspack against the database ?

In either case the (AWR/Statspack) history will probably have captured a copy of the statement when it was running badly, so you should have a few execution stats for the bad run. If it's in the AWR and you have enterprise manager you can access this through the GUI, if not the you can run awrsqrpt.sql (or sprepsql.sql) for a text report of the SQL for the snapshot ids.

The execution stats in the report may give you some clues, e.g. based on the headings in the stats section:


-> CPU and Elapsed Time are in seconds (s) for Statement Total and in
milliseconds (ms) for Per Execute
% Snap
Statement Total Per Execute Total
--------------- --------------- ------
Buffer Gets:
Disk Reads:
Rows processed:
CPU Time(s/ms):
Elapsed Time(s/ms):
Sorts:
Parse Calls:
Invalidations:
Version count:
Sharable Mem(K):
Executions:


Is the number of disk reads much larger, or it is about the same with some extra CPU time and a much larger elapsed time
The latter might simply mean a good run is well cached, a bad run is badly cached.

Is the number of buffer gets about the same, or massively increased with a huge increase in buffer gets with an increase in disk reads - if the plan is DEFINITELY unchanged this would be a hint that a conflicting update was running at the same time and you spent a lot of time generating read-consistent images.

Get the report for a few sessions with good run times so that you have a handle on the possible workload and variation on a good run.

Regards
Jonathan Lewis




________________________________________
From: oracle-l-***@freelists.org <oracle-l-***@freelists.org> on behalf of Upendra nerilla <***@hotmail.com>
Sent: 14 September 2018 03:54:59
To: Martin Berger
Cc: Oracle-L oracle-l
Subject: Re: Instrumenting Poor performance

Thanks Martin. That helps..
I have used orasrp and TVD$XTAT, both are great.
I can generate the trace only when the performance issue pops up in a couple of weeks..
Wondering if there is anything else that I could look into while waiting?



________________________________
From: Martin Berger <***@gmail.com>
Sent: Thursday, September 13, 2018 4:37 PM
To: ***@hotmail.com
Cc: Oracle-L oracle-l
Subject: Re: Instrumenting Poor performance

to compile sql_trace files, you can use tools like
tkprof
orasrp<http://www.oracledba.ru/orasrp/>
TVD$XTAT<https://antognini.ch/2008/10/introduce-tvdxtat/>
Method-R Profiler<https://method-r.com/software/workbench/>
there are others also.
This should give you a good starting point where and why time is spent sometimes.

hth,
berx

Am Do., 13. Sep. 2018 um 22:04 Uhr schrieb Upendra nerilla <***@hotmail.com<mailto:***@hotmail.com>>:
is it always the same SQL_ID?
You an enable sql_trace for this particular SQL:
alter system set events 'sql_trace[sql: <SQL_ID>] level=12';

It is always the same sql_id. We have a 10046 and 10053 trace for the good execution.. Waiting for the next poor execution to capture the traces again..

I assume EBS can enable tracing for specific activities, but I'm not close to EBS, but maybe there are better solutions than simple sql_trace.

I am not an EBS guy, so I am at a loss here. :(

Do you have proper facility to create a profile based on these sql_trace files?

There is already a sql_profile created for the sql_id. Is that what you mean?


________________________________
From: Martin Berger <***@gmail.com<mailto:***@gmail.com>>
Sent: Thursday, September 13, 2018 3:52 PM
To: ***@hotmail.com<mailto:***@hotmail.com>
Cc: Oracle-L oracle-l
Subject: Re: Instrumenting Poor performance

Hi Upendra,

is it always the same SQL_ID?
You an enable sql_trace for this particular SQL:
alter system set events 'sql_trace[sql: <SQL_ID>] level=12';


I assume EBS can enable tracing for specific activities, but I'm not close to EBS, but maybe there are better solutions than simple sql_trace.

With the traces of different "good" and "bad" and see where the time is spent.

Do you have proper facility to create a profile based on these sql_trace files?

br,
berx



Am Do., 13. Sep. 2018 um 21:11 Uhr schrieb Upendra nerilla <***@hotmail.com<mailto:***@hotmail.com>>:
Hello Team -
We have an EBS application which is running EBS and several other modules (OTC). From time to time I see a particular query giving poor response time. Normal response is <4 mins and during the poor execution, it goes over an hour.

We have a SQL baseline created for that SQL_ID and forcing the optimal plan. We are able to see that optimizer is using the same hash. There are no host resource (cpu/memory/io) constraints.. This job runs only a couple of times a day, very time sensitive for the business. We are unable to reproduce this at lower environments even when we performed refreshes..

We see a pattern that this issue shows up during the first week of the month and disappears after a couple of days (not the same exact day).
Here is what I am thinking of gathering - high level info..

1. Gather daily row count of all the tables involved in the query..
2. Review the jobs and see if there is anything unique..

What else could I gather to troubleshoot this further?
Appreciate any insights..

Thank you
-Upendra
Dominic Brooks
2018-09-14 06:52:00 UTC
Permalink
Sql plan hash does not include the predicates so that’s one niche consideration to look out for. AWR doesn’t hold predicates so you will need to wait until you can pull plans from memory via dbms_xplan.display_cursor to see that. It’s possible but unusual.

Real time sql monitoring can you tell you where the time was spent but if you no longer have that available for bad execution then you can get a very rough approximation from ASH raw data.

Full sql trace is best to see what you were doing but ASH is usually good enough.

A combination of wait events (e.g runtime decisions about switching from buffered to direct io) and the detail of what you are waiting on exactly (e.g reading lots of undo) might be sufficient to give pointers.

Otherwise you might need the combination of ash/tracing (what) and snapping session statistics (why) to give a better picture.

Regards
Dominic

Sent from my iPhone

On 13 Sep 2018, at 21:47, Upendra nerilla <***@hotmail.com<mailto:***@hotmail.com>> wrote:

is it always the same SQL_ID?
You an enable sql_trace for this particular SQL:
alter system set events 'sql_trace[sql: <SQL_ID>] level=12';

It is always the same sql_id. We have a 10046 and 10053 trace for the good execution.. Waiting for the next poor execution to capture the traces again..

I assume EBS can enable tracing for specific activities, but I'm not close to EBS, but maybe there are better solutions than simple sql_trace.

I am not an EBS guy, so I am at a loss here. :(

Do you have proper facility to create a profile based on these sql_trace files?

There is already a sql_profile created for the sql_id. Is that what you mean?


________________________________
From: Martin Berger <***@gmail.com<mailto:***@gmail.com>>
Sent: Thursday, September 13, 2018 3:52 PM
To: ***@hotmail.com<mailto:***@hotmail.com>
Cc: Oracle-L oracle-l
Subject: Re: Instrumenting Poor performance

Hi Upendra,

is it always the same SQL_ID?
You an enable sql_trace for this particular SQL:
alter system set events 'sql_trace[sql: <SQL_ID>] level=12';


I assume EBS can enable tracing for specific activities, but I'm not close to EBS, but maybe there are better solutions than simple sql_trace.

With the traces of different "good" and "bad" and see where the time is spent.

Do you have proper facility to create a profile based on these sql_trace files?

br,
berx



Am Do., 13. Sep. 2018 um 21:11 Uhr schrieb Upendra nerilla <***@hotmail.com<mailto:***@hotmail.com>>:
Hello Team -
We have an EBS application which is running EBS and several other modules (OTC). From time to time I see a particular query giving poor response time. Normal response is <4 mins and during the poor execution, it goes over an hour.

We have a SQL baseline created for that SQL_ID and forcing the optimal plan. We are able to see that optimizer is using the same hash. There are no host resource (cpu/memory/io) constraints.. This job runs only a couple of times a day, very time sensitive for the business. We are unable to reproduce this at lower environments even when we performed refreshes..

We see a pattern that this issue shows up during the first week of the month and disappears after a couple of days (not the same exact day).
Here is what I am thinking of gathering - high level info..

1. Gather daily row count of all the tables involved in the query..
2. Review the jobs and see if there is anything unique..

What else could I gather to troubleshoot this further?
Appreciate any insights..

Thank you
-Upendra
Upendra nerilla
2018-09-17 21:25:39 UTC
Permalink
Dominic/Jonathan/Stefan and others,
Aplogies for the delay... I was trying to make sure I go back and review all the data that we already have gathered..

A couple of quick information:
This is running from EE 12.1.0.2 RAC.
We do have Tuning/Diag licenses.

It turns out, we do have a bit of useful information gathered during our last poor execution..

1. We have the 10046 trace
2. SQLT report

I performed a quick profiler comparison for both the bad and good execution, plan hash is the same. Here is the summary of the tkprof out how it appears..

Bad run:
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 0 0.00 0.00 0 0 0 0
Execute 3 0.00 0.00 0 0 0 0
Fetch 3 160.44 1289.07 3673632 13274897 0 368
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 6 160.44 1289.07 3673632 13274897 0 368


Good run:
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 0 0.00 0.00 0 0 0 0
Execute 9 0.00 0.00 0 0 0 0
Fetch 9 119.20 119.95 135 40086840 0 2111
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 18 119.20 119.96 135 40086840 0 2111

Upon looking at the details, "OE_ORDER_LINES_ALL" seems to be the object which was read from disk..

db file sequential read 92.7% 1,258.1501s 3,692,162 0.0003s 0.0001s 0.5071s

I am now leaning towards Jonathan's suggestion of a potential large scale update on this table.. I will check with the team and see if there is any such update which happens.. With the AWR baseline created around that time, I am going to go through and see if I could find it..

Knowing that I have all this information, is there anything else that I could be looking at?

Thank you so much everyone


-Upendra

________________________________
From: Dominic Brooks <***@hotmail.com>
Sent: Friday, September 14, 2018 2:52 AM
To: ***@hotmail.com
Cc: Martin Berger; Oracle-L oracle-l
Subject: Re: Instrumenting Poor performance

Sql plan hash does not include the predicates so that’s one niche consideration to look out for. AWR doesn’t hold predicates so you will need to wait until you can pull plans from memory via dbms_xplan.display_cursor to see that. It’s possible but unusual.

Real time sql monitoring can you tell you where the time was spent but if you no longer have that available for bad execution then you can get a very rough approximation from ASH raw data.

Full sql trace is best to see what you were doing but ASH is usually good enough.

A combination of wait events (e.g runtime decisions about switching from buffered to direct io) and the detail of what you are waiting on exactly (e.g reading lots of undo) might be sufficient to give pointers.

Otherwise you might need the combination of ash/tracing (what) and snapping session statistics (why) to give a better picture.

Regards
Dominic

Sent from my iPhone

On 13 Sep 2018, at 21:47, Upendra nerilla <***@hotmail.com<mailto:***@hotmail.com>> wrote:

is it always the same SQL_ID?
You an enable sql_trace for this particular SQL:
alter system set events 'sql_trace[sql: <SQL_ID>] level=12';

It is always the same sql_id. We have a 10046 and 10053 trace for the good execution.. Waiting for the next poor execution to capture the traces again..

I assume EBS can enable tracing for specific activities, but I'm not close to EBS, but maybe there are better solutions than simple sql_trace.

I am not an EBS guy, so I am at a loss here. :(

Do you have proper facility to create a profile based on these sql_trace files?

There is already a sql_profile created for the sql_id. Is that what you mean?


________________________________
From: Martin Berger <***@gmail.com<mailto:***@gmail.com>>
Sent: Thursday, September 13, 2018 3:52 PM
To: ***@hotmail.com<mailto:***@hotmail.com>
Cc: Oracle-L oracle-l
Subject: Re: Instrumenting Poor performance

Hi Upendra,

is it always the same SQL_ID?
You an enable sql_trace for this particular SQL:
alter system set events 'sql_trace[sql: <SQL_ID>] level=12';


I assume EBS can enable tracing for specific activities, but I'm not close to EBS, but maybe there are better solutions than simple sql_trace.

With the traces of different "good" and "bad" and see where the time is spent.

Do you have proper facility to create a profile based on these sql_trace files?

br,
berx



Am Do., 13. Sep. 2018 um 21:11 Uhr schrieb Upendra nerilla <***@hotmail.com<mailto:***@hotmail.com>>:
Hello Team -
We have an EBS application which is running EBS and several other modules (OTC). From time to time I see a particular query giving poor response time. Normal response is <4 mins and during the poor execution, it goes over an hour.

We have a SQL baseline created for that SQL_ID and forcing the optimal plan. We are able to see that optimizer is using the same hash. There are no host resource (cpu/memory/io) constraints.. This job runs only a couple of times a day, very time sensitive for the business. We are unable to reproduce this at lower environments even when we performed refreshes..

We see a pattern that this issue shows up during the first week of the month and disappears after a couple of days (not the same exact day).
Here is what I am thinking of gathering - high level info..

1. Gather daily row count of all the tables involved in the query..
2. Review the jobs and see if there is anything unique..

What else could I gather to troubleshoot this further?
Appreciate any insights..

Thank you
-Upendra
Upendra nerilla
2018-09-18 00:02:55 UTC
Permalink
Validated the object is indeed OE_ORDER_LINES_ALL by picking a block from 10046 trace and confirmed. Validated the indexes, none of them were recently created.

Would it help caching the entire table? alter table.. cache;

Unfortunately we hit the issue again today.. 🙁
I am gathering a full 10046 trace, sqlt with all other reports..


________________________________
From: Jonathan Lewis <***@jlcomp.demon.co.uk>
Sent: Monday, September 17, 2018 7:05 PM
To: Dominic Brooks; ***@hotmail.com
Cc: Martin Berger; Oracle-L oracle-l
Subject: Re: Instrumenting Poor performance


Looking at the numbers:

the CR (query) gets per execution are consistent across the two reports, so it's unlikely that you have a large volume of undo being applied for read-consistency; application of undo records would have to show up as extra CR gets.

Further - if your 92.7% is counting the waits for "db file sequential read" in the trace file and they're coming from reads of the OE_ORDER_LINES_ALL table then they're not reads of the undo tablespace, which adds the argument that it's not about big updates and large scale read-consistency work. You can cross check the 10046 reads very easily by checking the "file#=" entry on the waits (as the obj# may be deceptive - it certainly can be in v$active_session_history). Are the reads from a file number that matches the undo tablespace or from a file number that matches the data object.

It seems more likely that some recent (or current) activity has simply pushed lots of the blocks of OE_ORDER_LINES_ALL table out of the buffer cache and you're having to read (and re-read, possibly) lots of blocks to get the result.

I suppose it's also possible that the plan has actually changed and you're using a different, much less precise, index to access the table (hypothetically someone could have dropped and rebuilt a couple of indexes with the names the wrong way round - I think that would make it possible to get the same plan hash value while using indexes with different definitions)


Regards
Jonathan Lewis

________________________________________
From: oracle-l-***@freelists.org <oracle-l-***@freelists.org> on behalf of Upendra nerilla <***@hotmail.com>
Sent: 17 September 2018 22:25
To: Dominic Brooks
Cc: Martin Berger; Oracle-L oracle-l
Subject: Re: Instrumenting Poor performance


Bad run:
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 0 0.00 0.00 0 0 0 0
Execute 3 0.00 0.00 0 0 0 0
Fetch 3 160.44 1289.07 3673632 13274897 0 368
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 6 160.44 1289.07 3673632 13274897 0 368


Good run:
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 0 0.00 0.00 0 0 0 0
Execute 9 0.00 0.00 0 0 0 0
Fetch 9 119.20 119.95 135 40086840 0 2111
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 18 119.20 119.96 135 40086840 0 2111

Upon looking at the details, "OE_ORDER_LINES_ALL" seems to be the object which was read from disk..

db file sequential read 92.7% 1,258.1501s 3,692,162 0.0003s 0.0001s 0.5071s
n***@gmail.com
2018-09-18 07:24:50 UTC
Permalink
Hi Upendra

I see that you say that this occurs around the start of each month and
disappears after a couple of days. Have you perhaps partitioned your Order
Entry tables and (assuming the plan really is the same) you are starting to
access new partitions for the first time? Usually month change funniness
would involve getting a plan change from/to nested loop access to the data
as a result of misleading stats on the new partition, but I see no reason
in principle why there wouldn't be a caching effect - though I'd be
surprised if it lasted days.
Post by Upendra nerilla
Dominic/Jonathan/Stefan and others,
Aplogies for the delay... I was trying to make sure I go back and review
all the data that we already have gathered..
This is running from EE 12.1.0.2 RAC.
We do have Tuning/Diag licenses.
It turns out, we do have a bit of useful information gathered during our
last poor execution..
1. We have the 10046 trace
2. SQLT report
I performed a quick profiler comparison for both the bad and good
execution, plan hash is the same. Here is the summary of the tkprof out how
it appears..
call count cpu elapsed disk query current
rows
------- ------ -------- ---------- ---------- ---------- ----------
----------
Parse 0 0.00 0.00 0 0 0
0
Execute 3 0.00 0.00 0 0 0
0
Fetch 3 160.44 1289.07 3673632 13274897 0
368
------- ------ -------- ---------- ---------- ---------- ----------
----------
total 6 160.44 1289.07 3673632 13274897 0
368
call count cpu elapsed disk query current
rows
------- ------ -------- ---------- ---------- ---------- ----------
----------
Parse 0 0.00 0.00 0 0 0
0
Execute 9 0.00 0.00 0 0 0
0
Fetch 9 119.20 119.95 135 40086840 0
2111
------- ------ -------- ---------- ---------- ---------- ----------
----------
total 18 119.20 119.96 135 40086840 0
2111
Upon looking at the details, "OE_ORDER_LINES_ALL" seems to be the object
which was read from disk..
db file sequential read 92.7% 1,258.1501s 3,692,162 0.0003s 0.0001s
0.5071s
I am now leaning towards Jonathan's suggestion of a potential large scale
update on this table.. I will check with the team and see if there is any
such update which happens.. With the AWR baseline created around that time,
I am going to go through and see if I could find it..
Knowing that I have all this information, is there anything else that I
could be looking at?
Thank you so much everyone
-Upendra
------------------------------
*Sent:* Friday, September 14, 2018 2:52 AM
*Cc:* Martin Berger; Oracle-L oracle-l
*Subject:* Re: Instrumenting Poor performance
Sql plan hash does not include the predicates so that’s one niche
consideration to look out for. AWR doesn’t hold predicates so you will need
to wait until you can pull plans from memory via dbms_xplan.display_cursor
to see that. It’s possible but unusual.
Real time sql monitoring can you tell you where the time was spent but if
you no longer have that available for bad execution then you can get a very
rough approximation from ASH raw data.
Full sql trace is best to see what you were doing but ASH is usually good enough.
A combination of wait events (e.g runtime decisions about switching from
buffered to direct io) and the detail of what you are waiting on exactly
(e.g reading lots of undo) might be sufficient to give pointers.
Otherwise you might need the combination of ash/tracing (what) and
snapping session statistics (why) to give a better picture.
Regards
Dominic
Sent from my iPhone
is it always the same SQL_ID?
alter system set events 'sql_trace[sql: <SQL_ID>] level=12';
It is always the same sql_id. We have a 10046 and 10053 trace for the good
execution.. Waiting for the next poor execution to capture the traces
again..
I assume EBS can enable tracing for specific activities, but I'm not close
to EBS, but maybe there are better solutions than simple sql_trace.
I am not an EBS guy, so I am at a loss here. :(
Do you have proper facility to create a profile based on these sql_trace files?
There is already a sql_profile created for the sql_id. Is that what you mean?
------------------------------
*Sent:* Thursday, September 13, 2018 3:52 PM
*Cc:* Oracle-L oracle-l
*Subject:* Re: Instrumenting Poor performance
Hi Upendra,
is it always the same SQL_ID?
alter system set events 'sql_trace[sql: <SQL_ID>] level=12';
I assume EBS can enable tracing for specific activities, but I'm not close
to EBS, but maybe there are better solutions than simple sql_trace.
With the traces of different "good" and "bad" and see where the time is spent.
Do you have proper facility to create a profile based on these sql_trace files?
br,
berx
Am Do., 13. Sep. 2018 um 21:11 Uhr schrieb Upendra nerilla <
Hello Team -
We have an EBS application which is running EBS and several other modules
(OTC). From time to time I see a particular query giving poor response
time. Normal response is <4 mins and during the poor execution, it goes
over an hour.
We have a SQL baseline created for that SQL_ID and forcing the optimal
plan. We are able to see that optimizer is using the same hash. There
are no host resource (cpu/memory/io) constraints.. This job runs only a
couple of times a day, very time sensitive for the business. We are unable
to reproduce this at lower environments even when we performed refreshes..
We see a pattern that this issue shows up during the first week of the
month and disappears after a couple of days (not the same exact day).
Here is what I am thinking of gathering - high level info..
1. Gather daily row count of all the tables involved in the query..
2. Review the jobs and see if there is anything unique..
What else could I gather to troubleshoot this further?
Appreciate any insights..
Thank you
-Upendra
--
Niall Litchfield
Oracle DBA
http://www.orawin.info
Upendra nerilla
2018-10-12 02:38:27 UTC
Permalink
Didn't see it come through.. resending..
Apologies if it is duplicated..


________________________________
From: Upendra nerilla <***@hotmail.com>
Sent: Tuesday, October 2, 2018 11:31 AM
To: Oracle-L
Subject: Re: Instrumenting Poor performance - SOLVED

All,
After much struggle, Oracle has come back pinpointing to a bug..

Here is the issue description:
Occasionally one of the custom EBS process was running slow. After trying a couple of fixes we thought we fixed the issue (because it occurred once or twice a month). However with no explanation the issue randomly showed up.

Here is how a good and bad tkprof output looked..

GOOD EXECUTION:
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 0 0.00 0.00 0 0 0 0
Execute 9 0.00 0.00 0 0 0 0
Fetch 9 119.20 119.95 135 40086840 0 2111
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 18 119.20 119.96 135 40086840 0 2111


POOR EXECUTION:
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 0 0.00 0.00 0 0 0 0
Execute 3 0.00 0.00 0 0 0 0
Fetch 3 160.44 1289.07 3673632 13274897 0 368
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 6 160.44 1289.07 3673632 13274897 0 368


Both the good and bad executions were using the same plan hash (which was forced through a baseline). Other than the occasional heavy "db file sequential read" wait events, there wasn't anything we could find. All the data access was similar between good and bad execution.

Upon closer look of 10046 trace, we observed that the same blocks were being read twice.. (see the screenshot attached)


Oracle narrowed down this to bugs - Bug 20352562<https://support.oracle.com/epmos/faces/BugDisplay?id=20352562&parent=SrDetailText&sourceId=3-18254389511>, bug 19145675<https://support.oracle.com/epmos/faces/BugDisplay?id=19145675&parent=SrDetailText&sourceId=3-18254389511> superceded by bug 19787571<https://support.oracle.com/epmos/faces/BugDisplay?id=19787571&parent=SrDetailText&sourceId=3-18254389511> (not open for public viewing)

Here are the excerpts from the bug:
"Single synchronous reads are due to the chaining. Maybe the duplicated read is because of intra-block chaining.
The table has 264 columns, therefore the rows will be split into pieces of <=255 rows, these may be in the same block
or separate blocks. I took a look at 0x01403d00 (5,15616) which is the first culprit, the rows are intra-block chained thus:

row 0 last piece of row 1
row 1 head piece
row 2 last piece of row 3
row 3 head piece
row 4 last piece of row 5
row 5 head piece
row 6 last piece of row 7
row 7 head piece
row 8 last piece of row 9
row 9 head piece
row 10 last piece of row 11
row 11 head piece
row 12 last piece of row 13
row 13 head piece
row 14 last piece of row 15
row 15 head piece
kcbldrget() looks like the start point."

In the meanwhile, we rewrote the offending query to eliminate a chunk of processing which fixed the issue for now. Oracle development has created an emergency patch for us which we are testing it.

Many thanks to everyone who assisted at the right time.

-Upendra

________________________________
From: ***@gmail.com <***@gmail.com>
Sent: Tuesday, September 18, 2018 3:24 AM
To: ***@hotmail.com
Cc: ***@hotmail.com; Martin Berger; ORACLE-L
Subject: Re: Instrumenting Poor performance

Hi Upendra

I see that you say that this occurs around the start of each month and disappears after a couple of days. Have you perhaps partitioned your Order Entry tables and (assuming the plan really is the same) you are starting to access new partitions for the first time? Usually month change funniness would involve getting a plan change from/to nested loop access to the data as a result of misleading stats on the new partition, but I see no reason in principle why there wouldn't be a caching effect - though I'd be surprised if it lasted days.

On Mon, Sep 17, 2018 at 10:27 PM Upendra nerilla <***@hotmail.com<mailto:***@hotmail.com>> wrote:
Dominic/Jonathan/Stefan and others,
Aplogies for the delay... I was trying to make sure I go back and review all the data that we already have gathered..

A couple of quick information:
This is running from EE 12.1.0.2 RAC.
We do have Tuning/Diag licenses.

It turns out, we do have a bit of useful information gathered during our last poor execution..

1. We have the 10046 trace
2. SQLT report

I performed a quick profiler comparison for both the bad and good execution, plan hash is the same. Here is the summary of the tkprof out how it appears..

Bad run:
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 0 0.00 0.00 0 0 0 0
Execute 3 0.00 0.00 0 0 0 0
Fetch 3 160.44 1289.07 3673632 13274897 0 368
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 6 160.44 1289.07 3673632 13274897 0 368


Good run:
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 0 0.00 0.00 0 0 0 0
Execute 9 0.00 0.00 0 0 0 0
Fetch 9 119.20 119.95 135 40086840 0 2111
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 18 119.20 119.96 135 40086840 0 2111

Upon looking at the details, "OE_ORDER_LINES_ALL" seems to be the object which was read from disk..

db file sequential read 92.7% 1,258.1501s 3,692,162 0.0003s 0.0001s 0.5071s

I am now leaning towards Jonathan's suggestion of a potential large scale update on this table.. I will check with the team and see if there is any such update which happens.. With the AWR baseline created around that time, I am going to go through and see if I could find it..

Knowing that I have all this information, is there anything else that I could be looking at?

Thank you so much everyone


-Upendra

________________________________
From: Dominic Brooks <***@hotmail.com<mailto:***@hotmail.com>>
Sent: Friday, September 14, 2018 2:52 AM
To: ***@hotmail.com<mailto:***@hotmail.com>
Cc: Martin Berger; Oracle-L oracle-l
Subject: Re: Instrumenting Poor performance

Sql plan hash does not include the predicates so that’s one niche consideration to look out for. AWR doesn’t hold predicates so you will need to wait until you can pull plans from memory via dbms_xplan.display_cursor to see that. It’s possible but unusual.

Real time sql monitoring can you tell you where the time was spent but if you no longer have that available for bad execution then you can get a very rough approximation from ASH raw data.

Full sql trace is best to see what you were doing but ASH is usually good enough.

A combination of wait events (e.g runtime decisions about switching from buffered to direct io) and the detail of what you are waiting on exactly (e.g reading lots of undo) might be sufficient to give pointers.

Otherwise you might need the combination of ash/tracing (what) and snapping session statistics (why) to give a better picture.

Regards
Dominic

Sent from my iPhone

On 13 Sep 2018, at 21:47, Upendra nerilla <***@hotmail.com<mailto:***@hotmail.com>> wrote:

is it always the same SQL_ID?
You an enable sql_trace for this particular SQL:
alter system set events 'sql_trace[sql: <SQL_ID>] level=12';

It is always the same sql_id. We have a 10046 and 10053 trace for the good execution.. Waiting for the next poor execution to capture the traces again..

I assume EBS can enable tracing for specific activities, but I'm not close to EBS, but maybe there are better solutions than simple sql_trace.

I am not an EBS guy, so I am at a loss here. :(

Do you have proper facility to create a profile based on these sql_trace files?

There is already a sql_profile created for the sql_id. Is that what you mean?


________________________________
From: Martin Berger <***@gmail.com<mailto:***@gmail.com>>
Sent: Thursday, September 13, 2018 3:52 PM
To: ***@hotmail.com<mailto:***@hotmail.com>
Cc: Oracle-L oracle-l
Subject: Re: Instrumenting Poor performance

Hi Upendra,

is it always the same SQL_ID?
You an enable sql_trace for this particular SQL:
alter system set events 'sql_trace[sql: <SQL_ID>] level=12';


I assume EBS can enable tracing for specific activities, but I'm not close to EBS, but maybe there are better solutions than simple sql_trace.

With the traces of different "good" and "bad" and see where the time is spent.

Do you have proper facility to create a profile based on these sql_trace files?

br,
berx



Am Do., 13. Sep. 2018 um 21:11 Uhr schrieb Upendra nerilla <***@hotmail.com<mailto:***@hotmail.com>>:
Hello Team -
We have an EBS application which is running EBS and several other modules (OTC). From time to time I see a particular query giving poor response time. Normal response is <4 mins and during the poor execution, it goes over an hour.

We have a SQL baseline created for that SQL_ID and forcing the optimal plan. We are able to see that optimizer is using the same hash. There are no host resource (cpu/memory/io) constraints.. This job runs only a couple of times a day, very time sensitive for the business. We are unable to reproduce this at lower environments even when we performed refreshes..

We see a pattern that this issue shows up during the first week of the month and disappears after a couple of days (not the same exact day).
Here is what I am thinking of gathering - high level info..

1. Gather daily row count of all the tables involved in the query..
2. Review the jobs and see if there is anything unique..

What else could I gather to troubleshoot this further?
Appreciate any insights..

Thank you
-Upendra


--
Niall Litchfield
Oracle DBA
http://www.orawin.info

Loading...