Discussion:
SQL performance issue
p***@gmail.com
2018-08-17 16:47:36 UTC
Permalink
Dear all,

Could someone shed some light on this please?

Thanks
Paresh
Hello,
I am stuck against a performance issue that was reported recently.
The issue I am currently looking at is about a job that takes 1 hour as
opposed to 20 minutes in the past. This is happening since we switched over
from primary to secondary site. Both sites are two-node rac databases. On
primary site this used to take 20 mins. We also ran this in pre-prod
environment and it takes roughly 20 mins.
I had taken a look at the AWR report from both pre-prod and PROD
databases. There is no single query that is taking long. However, there is
a query that runs for 13,000 times and is taking 0.23 seconds per execution
as opposed to 0.05 against pre-prod. This explains the additional time
taken in PROD (i.e. new PROD). Please note that the plan hash value for the
SQL_ID in both environments is exactly the same.
Top wait events in PROD suggests high DB CPU and cluster related wait
events. So I suspect there is something down to getting blocks from the
resource master instance for the blocks in question. Please could someone
help?
Database version is 11.2.0.4.
All the details are listed in the attached files.
I would greatly appreciate if you could share your views with me.
Many thanks
Paresh
Powell, Mark
2018-08-17 16:59:43 UTC
Permalink
Paresh, for help with a query performance issue you should post the SQL and actual query plan in use by Oracle. Most often when query performance drastically changes there has been a change in the plan due to a change in statistics, bind variable peek, adaptive query plan, etc...



Mark Powell
Database Administration
(313) 592-5148


________________________________
From: oracle-l-***@freelists.org <oracle-l-***@freelists.org> on behalf of ***@gmail.com <***@gmail.com>
Sent: Friday, August 17, 2018 12:47:36 PM
To: oracle-***@freelists.org
Subject: Re: SQL performance issue

Dear all,

Could someone shed some light on this please?

Thanks
Paresh


On Tue, 14 Aug 2018, 15:48 Paresh Gandhi, <***@gmail.com<mailto:***@gmail.com>> wrote:

Hello,



I am stuck against a performance issue that was reported recently.



The issue I am currently looking at is about a job that takes 1 hour as opposed to 20 minutes in the past. This is happening since we switched over from primary to secondary site. Both sites are two-node rac databases. On primary site this used to take 20 mins. We also ran this in pre-prod environment and it takes roughly 20 mins.



I had taken a look at the AWR report from both pre-prod and PROD databases. There is no single query that is taking long. However, there is a query that runs for 13,000 times and is taking 0.23 seconds per execution as opposed to 0.05 against pre-prod. This explains the additional time taken in PROD (i.e. new PROD). Please note that the plan hash value for the SQL_ID in both environments is exactly the same.



Top wait events in PROD suggests high DB CPU and cluster related wait events. So I suspect there is something down to getting blocks from the resource master instance for the blocks in question. Please could someone help?


Database version is 11.2.0.4.


All the details are listed in the attached files.



I would greatly appreciate if you could share your views with me.



Many thanks

Paresh



DXC - This is a PRIVATE message - If you are not the intended recipient, please delete without copying and kindly advise us by e-mail of the mistake in delivery. NOTE: Regardless of content, this e-mail shall not operate to bind the Company to any order or other contract unless pursuant to explicit written agreement or government initiative expressly permitting the use of e-mail for such purpose.
Tim Gorman
2018-08-17 17:32:56 UTC
Permalink
Paresh,

As another suggestion, please think more about the appropriateness of
using an AWR report for tuning a specific process?

An AWR report consists of information gathered for a database instance
or all the instances in a RAC cluster, which is an enormous amount of
information pertaining to many processes.  You are trying to tune a
specific process, so you have already arrived at the endpoint of AWR
analysis, which is identifying a SQL statement or process within an
instance or cluster, and you are ready to begin the next step by
examining the process in detail using SQL tracing.

So please consider learning more about SQL tracing and TKPROF (and
similar utilities), about which numerous books, blog post articles, and
Oracle documentation has been written.  AWR reports will not provide
much assistance to you for this task.

The other consideration is that this email list is not Oracle Support. 
If you or your company has Oracle licenses, then you have the resources
of Oracle Support to assist you.  They have an instructive process for
resolving such cases.  There is evidently urgency implied in your emails
to this list, and if there is indeed urgency for resolving this
situation, then working this case with Oracle Support is the most
responsible thing for you to do.  This list will be useful for helping
you understand what Oracle Support asks you to do, but we are not your
support organization.

Hope this helps...

-Tim
Post by Powell, Mark
Paresh, for help with a query performance issue you should post the
SQL and actual query plan in use by Oracle.  Most often when query
performance drastically changes there has been a change in the plan
due to a change in statistics, bind variable peek, adaptive query
plan, etc...
Mark Powell
Database Administration
(313) 592-5148
------------------------------------------------------------------------
*Sent:* Friday, August 17, 2018 12:47:36 PM
*Subject:* Re: SQL performance issue
Dear all,
Could someone shed some light on this please?
Thanks
Paresh
Hello,
I am stuck against a performance issue that was reported recently.
The issue I am currently looking at is about a job that takes 1
hour as opposed to 20 minutes in the past. This is happening since
we switched over from primary to secondary site. Both sites are
two-node rac databases. On primary site this used to take 20 mins.
We also ran this in pre-prod environment and it takes roughly 20 mins.
I had taken a look at the AWR report from both pre-prod and PROD
databases. There is no single query that is taking long. However,
there is a query that runs for 13,000 times and is taking 0.23
seconds per execution as opposed to 0.05 against pre-prod. This
explains the additional time taken in PROD (i.e. new PROD). Please
note that the plan hash value for the SQL_ID in both environments
is exactly the same.
Top wait events in PROD suggests high DB CPU and cluster related
wait events. So I suspect there is something down to getting
blocks from the resource master instance for the blocks in
question. Please could someone help?
Database version is 11.2.0.4.
All the details are listed in the attached files.
I would greatly appreciate if you could share your views with me.
Many thanks
Paresh
DXC - This is a PRIVATE message - If you are not the intended
recipient, please delete without copying and kindly advise us by
e-mail of the mistake in delivery. NOTE: Regardless of content, this
e-mail shall not operate to bind the Company to any order or other
contract unless pursuant to explicit written agreement or government
initiative expressly permitting the use of e-mail for such purpose.
sachin pawar
2018-08-17 17:51:38 UTC
Permalink
Hi Paresh,
If you want to find the root cause of the sql performance i suggest you
provide :

++++++++++++++++

1. the AWR report from both executions ( good and bad)
2. SQLT XTRACT from both sql ids.

Please , for this sql id , provide the output from the SQLT utility, using
the XTRACT method (it is important use this method for this issue) for:
--- The XTRACT method will take the SQLID of the problem SQL as input and
will *NOT* execute the SQL.
--- For information on obtaining and using SQLT, please refer to:
SQLT Usage Instructions (Doc ID 1614107.1)

For example the following file is from a successful SQLT run using the
XTRACT method:
sqlt_s45774_xtract_fp48hh5dkm529.zip

++++++++++++++++

If you can open a SR then that it would be easier for support to assist
you ;)

Rgds,
Sachin Pawar
https://twitter.com/sach_pwr
Post by Powell, Mark
Paresh,
As another suggestion, please think more about the appropriateness of
using an AWR report for tuning a specific process?
An AWR report consists of information gathered for a database instance or
all the instances in a RAC cluster, which is an enormous amount of
information pertaining to many processes. You are trying to tune a
specific process, so you have already arrived at the endpoint of AWR
analysis, which is identifying a SQL statement or process within an
instance or cluster, and you are ready to begin the next step by examining
the process in detail using SQL tracing.
So please consider learning more about SQL tracing and TKPROF (and similar
utilities), about which numerous books, blog post articles, and Oracle
documentation has been written. AWR reports will not provide much
assistance to you for this task.
The other consideration is that this email list is not Oracle Support. If
you or your company has Oracle licenses, then you have the resources of
Oracle Support to assist you. They have an instructive process for
resolving such cases. There is evidently urgency implied in your emails to
this list, and if there is indeed urgency for resolving this situation,
then working this case with Oracle Support is the most responsible thing
for you to do. This list will be useful for helping you understand what
Oracle Support asks you to do, but we are not your support organization.
Hope this helps...
-Tim
Paresh, for help with a query performance issue you should post the SQL
and actual query plan in use by Oracle. Most often when query performance
drastically changes there has been a change in the plan due to a change in
statistics, bind variable peek, adaptive query plan, etc...
Mark Powell
Database Administration
(313) 592-5148
------------------------------
*Sent:* Friday, August 17, 2018 12:47:36 PM
*Subject:* Re: SQL performance issue
Dear all,
Could someone shed some light on this please?
Thanks
Paresh
Hello,
I am stuck against a performance issue that was reported recently.
The issue I am currently looking at is about a job that takes 1 hour as
opposed to 20 minutes in the past. This is happening since we switched over
from primary to secondary site. Both sites are two-node rac databases. On
primary site this used to take 20 mins. We also ran this in pre-prod
environment and it takes roughly 20 mins.
I had taken a look at the AWR report from both pre-prod and PROD
databases. There is no single query that is taking long. However, there is
a query that runs for 13,000 times and is taking 0.23 seconds per execution
as opposed to 0.05 against pre-prod. This explains the additional time
taken in PROD (i.e. new PROD). Please note that the plan hash value for the
SQL_ID in both environments is exactly the same.
Top wait events in PROD suggests high DB CPU and cluster related wait
events. So I suspect there is something down to getting blocks from the
resource master instance for the blocks in question. Please could someone
help?
Database version is 11.2.0.4.
All the details are listed in the attached files.
I would greatly appreciate if you could share your views with me.
Many thanks
Paresh
DXC - This is a PRIVATE message - If you are not the intended recipient,
please delete without copying and kindly advise us by e-mail of the mistake
in delivery. NOTE: Regardless of content, this e-mail shall not operate to
bind the Company to any order or other contract unless pursuant to explicit
written agreement or government initiative expressly permitting the use of
e-mail for such purpose.
Mark W. Farnham
2018-08-17 18:09:10 UTC
Permalink
What Mark Powell and Tim Gorman wrote are sensible policies.

Call this a “shot-in-the-dark.”: Depending on how you moved the data, you *could* find yourself in a period where all the data is block mastered on one node. IF .23 seconds is the new average for the chronic load generator, they could be the average of about .4something on one node and .05or_so on the other node.



A similar difference could also easily result from cached or not cached with respect to the data as well as “I’m in pga, needing undo rollback, so I do it every time.” THAT can be cured at least temporarily by forcing the non-direct read and cleanout. IF the need for the undo was an artifact of the move, doing it once may be enough.



A similar difference could also easily result from a small change in the client to server network access, especially if arraysize is small (or inadvertently different in the client configuration of old prod versus new prod.)



Okay, several shots in the dark. All trivial to check.

Likewise, you could have similar issues apart from the chronic load increase with the was 20 minutes, now is 1 hour job. A diagnostic trivial to do is set the session to force parallel local and then run the query on each node. The “bad” one may or may not show up as an increase in gcc traffic that is significant. Or this might not help. It is an easy test to do to rule in or out a big chunk of the possible solution spaces.

IF this happens to be a lucky shot at your problem, still listen to Powell and Gorman. Also, JL’s scratchpad had a decent laundry list of “what the heck changed” that you can look at in parallel with measuring the old versus new profile to avoid guessing at all and SEE the actual difference.



Good luck,



mwf



From: oracle-l-***@freelists.org [mailto:oracle-l-***@freelists.org] On Behalf Of ***@gmail.com
Sent: Friday, August 17, 2018 12:48 PM
To: oracle-***@freelists.org
Subject: Re: SQL performance issue



Dear all,



Could someone shed some light on this please?



Thanks

Paresh





On Tue, 14 Aug 2018, 15:48 Paresh Gandhi, <***@gmail.com> wrote:

Hello,



I am stuck against a performance issue that was reported recently.



The issue I am currently looking at is about a job that takes 1 hour as opposed to 20 minutes in the past. This is happening since we switched over from primary to secondary site. Both sites are two-node rac databases. On primary site this used to take 20 mins. We also ran this in pre-prod environment and it takes roughly 20 mins.



I had taken a look at the AWR report from both pre-prod and PROD databases. There is no single query that is taking long. However, there is a query that runs for 13,000 times and is taking 0.23 seconds per execution as opposed to 0.05 against pre-prod. This explains the additional time taken in PROD (i.e. new PROD). Please note that the plan hash value for the SQL_ID in both environments is exactly the same.



Top wait events in PROD suggests high DB CPU and cluster related wait events. So I suspect there is something down to getting blocks from the resource master instance for the blocks in question. Please could someone help?



Database version is 11.2.0.4.



All the details are listed in the attached files.



I would greatly appreciate if you could share your views with me.



Many thanks

Paresh
Karl Arao
2018-08-17 18:22:38 UTC
Permalink
For troubleshooting SQL issues I start with planx.sql (
https://github.com/karlarao/scripts/blob/master/performance/planx.sql) to
get most of the info I need to see what's happening, and which part of the
plan_id it's spending most of the time and related info I can correlate
with around the SQL_ID. This was written by Carlos but I added a few of my
own SQLs and call to snapper

@planx Y <sql_id>

And then I would run sqld360/sqldb360 if I need more info like metadata
info, parameters, etc.



-Karl
--
Karl Arao
Wiki: karlarao.wiki
Twitter: @karlarao <http://twitter.com/karlarao>
Jonathan Lewis
2018-08-18 09:31:30 UTC
Permalink
"There is no single query that is taking long. However, there is a query that runs for 13,000 times and is taking 0.23 seconds per execution as opposed to 0.05 against pre-prod."

Is that really 13,000 different queries or just the one query repeated 13,000 times ? Unless you're using literals instead of binds that's "one single query".
If it showed up in the AWR report then there will be detailed stats for it for the interval - you can get them graphically, but I tend to use the script awrsqrpt.sql (in $ORACLE_HOME/rdbms/admin). If the query shows up in a good and a bad AWR on production then you can run of the report for that SQL_ID for a good period and a bad period and part of the report will tell you specifically about the most significant work done by the query.


Regards
Jonathan Lewis
________________________________________
From: oracle-l-***@freelists.org <oracle-l-***@freelists.org> on behalf of ***@gmail.com <***@gmail.com>
Sent: 17 August 2018 17:47
To: oracle-***@freelists.org
Subject: Re: SQL performance issue

Dear all,

Could someone shed some light on this please?

Thanks
Paresh


On Tue, 14 Aug 2018, 15:48 Paresh Gandhi, <***@gmail.com<mailto:***@gmail.com>> wrote:

Hello,



I am stuck against a performance issue that was reported recently.



The issue I am currently looking at is about a job that takes 1 hour as opposed to 20 minutes in the past. This is happening since we switched over from primary to secondary site. Both sites are two-node rac databases. On primary site this used to take 20 mins. We also ran this in pre-prod environment and it takes roughly 20 mins.



I had taken a look at the AWR report from both pre-prod and PROD databases. There is no single query that is taking long. However, there is a query that runs for 13,000 times and is taking 0.23 seconds per execution as opposed to 0.05 against pre-prod. This explains the additional time taken in PROD (i.e. new PROD). Please note that the plan hash value for the SQL_ID in both environments is exactly the same.



Top wait events in PROD suggests high DB CPU and cluster related wait events. So I suspect there is something down to getting blocks from the resource master instance for the blocks in question. Please could someone help?


Database version is 11.2.0.4.


All the details are listed in the attached files.



I would greatly appreciate if you could share your views with me.



Many thanks

Paresh

--
http://www.freelists.org/webpage/oracle-l

Loading...