Laurentiu Oprea
2021-03-18 13:51:14 UTC
Regarding parallel I`m relaying my affirmation of the fact that I tested
with hint /*+ dynamic_sampling(8) */ (the DS level with parallel hint) and
gives the same estimate.
Yes there is a relation order_procesing_date between
business_procesing_date -1 and business_procesing_date + 1 and there is no
fooling like null values or year 1900/9999
The pattern is constant , grouping by partition_date for the last 7 days
it ranges from 0 to 130k (out of a total of 24M) rows filtered by
condition order_procesing_date > = business_procesing_date .
Also the columns are not virtual columns (strange expressions)
with hint /*+ dynamic_sampling(8) */ (the DS level with parallel hint) and
gives the same estimate.
Yes there is a relation order_procesing_date between
business_procesing_date -1 and business_procesing_date + 1 and there is no
fooling like null values or year 1900/9999
The pattern is constant , grouping by partition_date for the last 7 days
it ranges from 0 to 130k (out of a total of 24M) rows filtered by
condition order_procesing_date > = business_procesing_date .
Also the columns are not virtual columns (strange expressions)
The change for parallel processing is one I'd have to think about - it may
be something to do with the way the predicates are rewritten for the
parallel query slaves - so I'll postpone thinking about that bit.
a) what's the business logic. is there an approximate relationship
between order_processing_date and business_processing date (e.g. the former
is usually about 3 days before the latter).
b) has the optimizer been fooled by a "silly null" effect - like "not yet
processed" = "31-Dec-9999"
c) how much does the pattern change over time - e.g. if the partition date
is sysdate - 1 is this a partition where a lot of rows have not yet had
business_processing_date set, but if partition date = sysdate - 7 then
maybe lots of rows now have a business_processing_date set.
Regards
Jonathan Lewis
be something to do with the way the predicates are rewritten for the
parallel query slaves - so I'll postpone thinking about that bit.
a) what's the business logic. is there an approximate relationship
between order_processing_date and business_processing date (e.g. the former
is usually about 3 days before the latter).
b) has the optimizer been fooled by a "silly null" effect - like "not yet
processed" = "31-Dec-9999"
c) how much does the pattern change over time - e.g. if the partition date
is sysdate - 1 is this a partition where a lot of rows have not yet had
business_processing_date set, but if partition date = sysdate - 7 then
maybe lots of rows now have a business_processing_date set.
Regards
Jonathan Lewis
Hello,
I recently saw a query with a wrong execution plan and looks to be caused
select
count(*)
from
orders
where
partition_date = sysdate-1 and
order_procesing_date > = business_procesing_date
the count itself is = 42K
the count itself without second condition ~ 24M
the cardinality estimate ~ 24M
the cardinality estimate if I add parallel hint: 1M (this is an
interesting one and looks to be related with the fact that parallel queries
get DS level 8)
order_procesing_date - business_procesing_date >= interval '0' minute
:1.4M
I haven't tried yet but I suspect that if I create extended stats on
expression ( order_procesing_date - business_procesing_date) can get me
better results
The question is what is the logic behind beter cardinality estimates if I
rewrite the where clause?
What will be the best approach for this query?
Thank you.
I recently saw a query with a wrong execution plan and looks to be caused
select
count(*)
from
orders
where
partition_date = sysdate-1 and
order_procesing_date > = business_procesing_date
the count itself is = 42K
the count itself without second condition ~ 24M
the cardinality estimate ~ 24M
the cardinality estimate if I add parallel hint: 1M (this is an
interesting one and looks to be related with the fact that parallel queries
get DS level 8)
order_procesing_date - business_procesing_date >= interval '0' minute
:1.4M
I haven't tried yet but I suspect that if I create extended stats on
expression ( order_procesing_date - business_procesing_date) can get me
better results
The question is what is the logic behind beter cardinality estimates if I
rewrite the where clause?
What will be the best approach for this query?
Thank you.