Automatic Stats for Large Table with 100%


DBMS_STATS run

/*+ no_parallel(t) no_parallel_index(t) dbms_stats cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring

 

As 10g has introduced some automatic behaviour as defaults (AUTO_SAMPLE_SIZE for estimate_percent and FOR ALL COLUMNS SIZE AUTO for method_opt), I decided to trace how these work.
In a production database, I had noticed Oracle gathering statistics on a single column for 20minutes after which it began the same task again, but with a sample size 10 times as large.

MetaLink Note#343849.1 provides some explanation for this behaviour.

Since I'd already created a large table of 83million rows (see my previous postings), I decided to see how the 'auto'mation worked.
(So far, in my test runs I always specify an estimate_percent and method_opt, not leaving them to default in 10g).

So I ran two DBMS_STATS.GATHER_TABLE_STATS executions on table TEST_APPEND which has 83million rows. The first execution was with "defaults" and the second one was with estimate_percent=>1


For the 'auto' sample size, a 10046 trace showed Oracle gathering samples *twice*. At the first run, it gathered statistics on *all* columns, using a sample of 0.0058412006 % (yes, 6 thousandthds of 1 percent !). Oracle seemed to have ben dissatistifed with what it gathered on only a few of the columns so re-ran a gather for a subset of columns but using a sample of 0.05854120060 %. (that is 6 hundredths of 1 percent !).
With such small sample sizes, you would think that the gather statistics was fast.

This is what it did :


select /*+ no_parallel(t) no_parallel_index(t) dbms_stats 

  cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring 

  */ count(*),count("OWNER"),count(distinct "OWNER"),

  sum(sys_op_opnsize("OWNER")),substrb(dump(min("OWNER"),16,0,32),1,120),

  substrb(dump(max("OWNER"),16,0,32),1,120),count("OBJECT_NAME"),

  count(distinct "OBJECT_NAME"),sum(sys_op_opnsize("OBJECT_NAME")),

  substrb(dump(min(substrb("OBJECT_NAME",1,32)),16,0,32),1,120),

  substrb(dump(max(substrb("OBJECT_NAME",1,32)),16,0,32),1,120),

  count("SUBOBJECT_NAME"),count(distinct "SUBOBJECT_NAME"),

  sum(sys_op_opnsize("SUBOBJECT_NAME")),substrb(dump(min("SUBOBJECT_NAME"),16,

  0,32),1,120),substrb(dump(max("SUBOBJECT_NAME"),16,0,32),1,120),

  count("OBJECT_ID"),count(distinct "OBJECT_ID"),

  sum(sys_op_opnsize("OBJECT_ID")),substrb(dump(min("OBJECT_ID"),16,0,32),1,

  120),substrb(dump(max("OBJECT_ID"),16,0,32),1,120),count("DATA_OBJECT_ID"),

  count(distinct "DATA_OBJECT_ID"),sum(sys_op_opnsize("DATA_OBJECT_ID")),

  substrb(dump(min("DATA_OBJECT_ID"),16,0,32),1,120),

  substrb(dump(max("DATA_OBJECT_ID"),16,0,32),1,120),count("OBJECT_TYPE"),

  count(distinct "OBJECT_TYPE"),sum(sys_op_opnsize("OBJECT_TYPE")),

  substrb(dump(min("OBJECT_TYPE"),16,0,32),1,120),

  substrb(dump(max("OBJECT_TYPE"),16,0,32),1,120),count("CREATED"),

  count(distinct "CREATED"),substrb(dump(min("CREATED"),16,0,32),1,120),

  substrb(dump(max("CREATED"),16,0,32),1,120),count("LAST_DDL_TIME"),

  count(distinct "LAST_DDL_TIME"),substrb(dump(min("LAST_DDL_TIME"),16,0,32),

  1,120),substrb(dump(max("LAST_DDL_TIME"),16,0,32),1,120),count("TIMESTAMP"),

  count(distinct "TIMESTAMP"),sum(sys_op_opnsize("TIMESTAMP")),

  substrb(dump(min("TIMESTAMP"),16,0,32),1,120),substrb(dump(max("TIMESTAMP"),

  16,0,32),1,120),count("STATUS"),count(distinct "STATUS"),

  sum(sys_op_opnsize("STATUS")),substrb(dump(min("STATUS"),16,0,32),1,120),

  substrb(dump(max("STATUS"),16,0,32),1,120),count("TEMPORARY"),

  count(distinct "TEMPORARY"),sum(sys_op_opnsize("TEMPORARY")),

  substrb(dump(min("TEMPORARY"),16,0,32),1,120),substrb(dump(max("TEMPORARY"),

  16,0,32),1,120),count("GENERATED"),count(distinct "GENERATED"),

  sum(sys_op_opnsize("GENERATED")),substrb(dump(min("GENERATED"),16,0,32),1,

  120),substrb(dump(max("GENERATED"),16,0,32),1,120),count("SECONDARY"),

  count(distinct "SECONDARY"),sum(sys_op_opnsize("SECONDARY")),

  substrb(dump(min("SECONDARY"),16,0,32),1,120),substrb(dump(max("SECONDARY"),

  16,0,32),1,120) 

from

 "TEST"."TEST_APPEND" sample (   .0058412006) t 





call     count       cpu    elapsed       disk      query    current        rows

------- ------  -------- ---------- ---------- ---------- ----------  ----------

Parse        1      0.02       0.04          0          0          0           0

Execute      1      0.00       0.00          0          0          0           0

Fetch        2     63.66     150.85     585686       8845          0           1

------- ------  -------- ---------- ---------- ---------- ----------  ----------

total        4     63.69     150.89     585686       8845          0           1



Misses in library cache during parse: 1

Optimizer mode: ALL_ROWS

Parsing user id: 61     (recursive depth: 1)



Rows     Row Source Operation

-------  ---------------------------------------------------

      1  SORT GROUP BY (cr=8845 pr=585686 pw=0 time=150851071 us)

   4798   TABLE ACCESS SAMPLE TEST_APPEND (cr=8845 pr=585686 pw=0 time=149699664 us)





Elapsed times include waiting on following events:

  Event waited on                             Times   Max. Wait  Total Waited

  ----------------------------------------   Waited  ----------  ------------

  db file scattered read                       4587        0.30        134.90

  db file sequential read                       170        0.06          1.01

  latch: shared pool                              5        0.01          0.03

  latch: cache buffers lru chain                  2        0.00          0.00

********************************************************************************



and



select /*+ no_parallel(t) no_parallel_index(t) dbms_stats 

  cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring 

  */ count(*),count("OBJECT_NAME"),count(distinct "OBJECT_NAME"),

  count("SUBOBJECT_NAME"),count(distinct "SUBOBJECT_NAME"),

  sum(sys_op_opnsize("SUBOBJECT_NAME")),substrb(dump(min("SUBOBJECT_NAME"),16,

  0,32),1,120),substrb(dump(max("SUBOBJECT_NAME"),16,0,32),1,120),

  count("OBJECT_ID"),count(distinct "OBJECT_ID"),count("DATA_OBJECT_ID"),

  count(distinct "DATA_OBJECT_ID"),sum(sys_op_opnsize("DATA_OBJECT_ID")),

  substrb(dump(min("DATA_OBJECT_ID"),16,0,32),1,120),

  substrb(dump(max("DATA_OBJECT_ID"),16,0,32),1,120) 

from

 "TEST"."TEST_APPEND" sample (   .0584120060) t 





call     count       cpu    elapsed       disk      query    current        rows

------- ------  -------- ---------- ---------- ---------- ----------  ----------

Parse        1      0.00       0.01          0          0          0           0

Execute      1      0.00       0.00          0          0          0           0

Fetch        2     95.72     176.11    1136088      85933          0           1

------- ------  -------- ---------- ---------- ---------- ----------  ----------

total        4     95.73     176.12    1136088      85933          0           1



Misses in library cache during parse: 1

Optimizer mode: ALL_ROWS

Parsing user id: 61     (recursive depth: 1)



Rows     Row Source Operation

-------  ---------------------------------------------------

      1  SORT GROUP BY (cr=85933 pr=1136088 pw=0 time=176113365 us)

  49017   TABLE ACCESS SAMPLE TEST_APPEND (cr=85933 pr=1136088 pw=0 time=153397014 us)





Elapsed times include waiting on following events:

  Event waited on                             Times   Max. Wait  Total Waited

  ----------------------------------------   Waited  ----------  ------------

  db file scattered read                       8994        0.27        152.61

  db file sequential read                        23        0.00          0.01

  latch: cache buffers lru chain                  1        0.00          0.00

********************************************************************************

<![if !supportLineBreakNewLine]>

<![endif]>


Oracle actually read 585,686 blocks and used 8,845 blocks to get a
sample of 4,798 rows in 150.89 seconds in it's first 'auto' run. In the second 'auto' run (for columns OBJECT_NAME, SUBOBJECT_NAME, OBJECT_ID and DATA_OBJECT_ID only -- presumably being satisfied with the statistics it had on column OWNER, OBJECT_TYPE etc !), it read 1,136,088 blocks, used 85,933 of them to read a sample of 49,017 rows in 176.12 seconds. That 1,136,088 blocks meant that it had actually read the whole table !
Thus, it's AUTO_SAMPLE_SIZE resulted in reading 1.5 times the table size and discarding 96.87% of what it had read. The number of rows sampled was less than 0.07% of the table.
The gathering of statistics took 326.97 seconds.


Could I do better with even a 1% sample size (note : Oracle had actually read 0.07% of the rows only !) ?
So the estimate_percent=>1 was my second execution.


select /*+ no_parallel(t) no_parallel_index(t) dbms_stats 

  cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring 

  */ count(*),count("OWNER"),count(distinct "OWNER"),

  sum(sys_op_opnsize("OWNER")),substrb(dump(min("OWNER"),16,0,32),1,120),

  substrb(dump(max("OWNER"),16,0,32),1,120),count("OBJECT_NAME"),

  count(distinct "OBJECT_NAME"),sum(sys_op_opnsize("OBJECT_NAME")),

  substrb(dump(min(substrb("OBJECT_NAME",1,32)),16,0,32),1,120),

  substrb(dump(max(substrb("OBJECT_NAME",1,32)),16,0,32),1,120),

  count("SUBOBJECT_NAME"),count(distinct "SUBOBJECT_NAME"),

  sum(sys_op_opnsize("SUBOBJECT_NAME")),substrb(dump(min("SUBOBJECT_NAME"),16,

  0,32),1,120),substrb(dump(max("SUBOBJECT_NAME"),16,0,32),1,120),

  count("OBJECT_ID"),count(distinct "OBJECT_ID"),

  sum(sys_op_opnsize("OBJECT_ID")),substrb(dump(min("OBJECT_ID"),16,0,32),1,

  120),substrb(dump(max("OBJECT_ID"),16,0,32),1,120),count("DATA_OBJECT_ID"),

  count(distinct "DATA_OBJECT_ID"),sum(sys_op_opnsize("DATA_OBJECT_ID")),

  substrb(dump(min("DATA_OBJECT_ID"),16,0,32),1,120),

  substrb(dump(max("DATA_OBJECT_ID"),16,0,32),1,120),count("OBJECT_TYPE"),

  count(distinct "OBJECT_TYPE"),sum(sys_op_opnsize("OBJECT_TYPE")),

  substrb(dump(min("OBJECT_TYPE"),16,0,32),1,120),

  substrb(dump(max("OBJECT_TYPE"),16,0,32),1,120),count("CREATED"),

  count(distinct "CREATED"),substrb(dump(min("CREATED"),16,0,32),1,120),

  substrb(dump(max("CREATED"),16,0,32),1,120),count("LAST_DDL_TIME"),

  count(distinct "LAST_DDL_TIME"),substrb(dump(min("LAST_DDL_TIME"),16,0,32),

  1,120),substrb(dump(max("LAST_DDL_TIME"),16,0,32),1,120),count("TIMESTAMP"),

  count(distinct "TIMESTAMP"),sum(sys_op_opnsize("TIMESTAMP")),

  substrb(dump(min("TIMESTAMP"),16,0,32),1,120),substrb(dump(max("TIMESTAMP"),

  16,0,32),1,120),count("STATUS"),count(distinct "STATUS"),

  sum(sys_op_opnsize("STATUS")),substrb(dump(min("STATUS"),16,0,32),1,120),

  substrb(dump(max("STATUS"),16,0,32),1,120),count("TEMPORARY"),

  count(distinct "TEMPORARY"),sum(sys_op_opnsize("TEMPORARY")),

  substrb(dump(min("TEMPORARY"),16,0,32),1,120),substrb(dump(max("TEMPORARY"),

  16,0,32),1,120),count("GENERATED"),count(distinct "GENERATED"),

  sum(sys_op_opnsize("GENERATED")),substrb(dump(min("GENERATED"),16,0,32),1,

  120),substrb(dump(max("GENERATED"),16,0,32),1,120),count("SECONDARY"),

  count(distinct "SECONDARY"),sum(sys_op_opnsize("SECONDARY")),

  substrb(dump(min("SECONDARY"),16,0,32),1,120),substrb(dump(max("SECONDARY"),

  16,0,32),1,120) 

from

 "TEST"."TEST_APPEND" sample (  1.0000000000) t 





call     count       cpu    elapsed       disk      query    current        rows

------- ------  -------- ---------- ---------- ---------- ----------  ----------

Parse        1      0.03       0.03          0          0          0           0

Execute      1      0.00       0.00          0          0          0           0

Fetch        2    118.46     205.36    1253727     895705          0           1

------- ------  -------- ---------- ---------- ---------- ----------  ----------

total        4    118.49     205.40    1253727     895705          0           1



Misses in library cache during parse: 1

Optimizer mode: ALL_ROWS

Parsing user id: 61     (recursive depth: 1)



Rows     Row Source Operation

-------  ---------------------------------------------------

      1  SORT GROUP BY (cr=895705 pr=1253727 pw=0 time=205366708 us)

 836541   TABLE ACCESS SAMPLE TEST_APPEND (cr=895705 pr=1253727 pw=0 time=184883528 us)





Elapsed times include waiting on following events:

  Event waited on                             Times   Max. Wait  Total Waited

  ----------------------------------------   Waited  ----------  ------------

  db file scattered read                       9855        0.31        151.24

  db file sequential read                         1        0.00          0.00

********************************************************************************



 


Aah ! A 1% sample was executed in 205.4 seconds, much faster time,
going through 1,253,727 blocks and using 895,705 of them to sample 836,541 rows.
The estimate_percent=>1 was *faster* and sampled many *more* rows than the 'auto_sample_size' !

Here’s a summary :




Gather Stats run Sample % age Disk Blocks Actually Read       Rows Sampled Time taken (sec)

Run 1 (all cols) .0058412006       585,686                4,798           150.85

Run 2 (some cols) .0584120060    1,136,088               49,017           176.12

Auto : total               --    1,721,774               53,815           326.97

    

Estimate percent 1 1.0000000000  1,253,727              836,541           205.40

   
Amit Kumar Srivastava
 

No comments:

Post a Comment