June 28, 2016 3 Comments
Just concluded a Full Day Event on Performance in Chandigarh for the North India Chapter of “All India Oracle User Group”. As committed in my earlier user group update blog, I thought for the benefit of the readers, posting the technical details on Optimizer, especially histograms, would help. Another reason, this is important is these sessions are attended by Developer communities as well and grasping everything in one session is very difficult. writeup will help them understand this critical piece.
I usually blog on real life challenges. The motivation behind this blog as well is an issue that I was working on and the fix that I applied to resolve it. This will be discussed at a relevant time and in a relevant part of this series.
I will publish a 4 part series, starting with the basics of Optimizer, the formulaes and then few examples with its various calculations. The series will be divided into 4 parts, such as :
- Optimizer Basics & Calculations
- Cardinality – Actuals v/s Assumed
- Histograms – Frequency & Height Balanced
- 12c Enhancements – Hybrid and TopN Frequency Histograms
In the context of the Optimizer, the two terminologies commonly used are SELECTIVITY and CARDINALITY.
SELECTVITY :It is measured as a percentage of rows that would be returned from or filtered out of a row set. Thus the Selectivity of a predicate indicates how many rows pass a predicate test. Selectivity ranges from 0.0 to 1.0. A Selectivity of 0.0 means no rows are selected from a row set, whereas a selectivity of 1.0 means all the rows are selected. A predicate become more Selective as the values approaches 0.0 and less selective if it approaches 1.0. It drives the Access Path for example, tablescan or an Index Scan. With No Histograms, Selectivity for a column is computed as 1/NUM_DISTINCT or DENSITY.
CARDINALITY :Cardinality is the estimated number of rows returned by each operation in an Execution Plan. The Optimizer determines cardinality for each operation based on complex set of formulas that use both, the table and column level statistics or dynamic statistics. Cardinality estimates must be as accurate as possible because they influence all aspects of an execution plan. Cardinality is important when the Optimizer determines the cost of a Join. For example, in a Nested Loop Join between an EMP and DEPT table, the number of rows returned by EMP table determines how often the DEPT table will be probed. Cardinality drives the Access Order.
Just to simplify, for a Gender column with M & F (2 Distinct Values), the selectivity will be 1/2 or 0.5. If this table has around 100 rows, then the Cardinality will be Selectivity X Num_Rows, which is 0.5 x 100 = 50.
For a 100 row table with 2 columns, each with distinct values as 4 and 2, the combined selectivity (for AND predicate) will be 0.5 X 0.25 = 0.125 and the Cardinality will be 0.125 X 100 = 12.5 rounded off to 13.
Assume Column C
NDV is the Number of Distinct Values
minv is the Minumim Value for C
maxv is the Maximum Value for C
The formulae for Selectivity Calculation would be as under :
- SEL(=C) = 1/NDV or DENSITY
- SEL(<C) = (C-minv)/(maxv-minv)
- SEL(<=C) = SEL(=C) + SEL(<C)
- SEL(>C) = (maxv-C)/(maxv-minv)
- SEL(>=C) = SEL(=C) + SEL(>C)
In case of a Range Predicate (<, , >=), the Numerator part is called as the Required Range and the Denominator Part is called as an Available Range.
Once the Selectivity is derived, Cardinality will be Selectivity multiplied by the Number of Rows. For multiple predicates involved, the Selectivity of each of these is derived based on above formulas and then used based on AND or OR predicates. For example :
- WHERE A=:b1 and B=:b2 = SEL(=A) X SEL(=B)
- WHERE A=:b1 and B>=:b2 = SEL(=A) X (SEL(=B) + SEL(>B))
- WHERE A=:b1 or B=:b2 = SEL(=A) + SEL(=B) – (SEL(=A) X SEL(=B))
This was the first part of this series. In the next part, we will create a sample table and run through each of these formulas.