Mar 20, 2014

Fix high Oracle “cache buffer chain” & “Buffer Busy Waits” events


Trouble-shoot Oracle “cache buffer chain” & “Buffer Busy Waits” events

About Oracle “cache buffer chain” event:

The cache buffers chains latches are used to protect a buffer list in the buffer cache. These latches are used when searching for, adding, or removing a buffer from the buffer cache.

Blocks in the buffer cache are placed on linked lists (cache buffer chains) which hang off a hash table. The hash chain that a block is placed on is based on the DBA and CLASS of the block. Each hash chain is protected by a single child latch. Processes need to get the relevant latch to allow them to scan a hash chain for a buffer so that the linked list does not change underneath them.

Contention on this latch usually means that there is a block that is in great contention (known as a hot block). See the sample AWR report (fig-1) showing “cache buffer chain” wait event as top event.

Drilling & Solution:

The "cache buffer chain" latch wait is normal, but high values are associated with high simultaneous buffer access, similar to a free-list shortage on an index or table segment header.
Query-1:
SQL> select count(*) child_count,
   sum(gets)   sum_gets,
   sum(misses) sum_misses,
   sum(sleeps) sum_sleeps
from gv$latch_children
where name = 'cache buffers chains';

Sample-output:


The first main type of latch that will be detailed for Oracle is called the buffer cache latch. The buffer cache latch family consists of two types of latches: the cache buffers chain latchand the other is the cache buffers LRU chain latch. First, take a look at the cache buffers chain latch. Cache buffers chain latches are acquired at the moment in time when a data block within the Oracle buffer cache is accessed by a process within Oracle. Usually latch contention for these buffer caches is due to poor disk I/O configuration. Reducing contention with these latches involves tuning the logical I/O for the associated SQL statements as well as the disk subsystem.

Another factor for latch contention with buffers chain latches could possibly be hot block contention. Oracle Metalink Note # 163424.1 has some useful tips on tuning and identifying hot blocks within the Oracle database environment.

The other buffer cache latch type is the cache buffers LRU chain latch. Whenever a new block enters the Oracle buffer cache within the SGA, this latch is acquired to allow block management in the Oracle SGA. Also, the latch is acquired when buffers are written back to disk such as when a scan is performed to move the LRU or least recently used chain of dirty blocks to flush out to disk from the buffer cache.

Query-2:

SQL> select inst_id,sid,event,p1,p2,p3,wait_class,seconds_in_wait
from gv$session_wait
where event = 'cache buffer chains';

The columns of the gv$session_wait view that are of particular interest for a buffer busy wait event are:

P1—The absolute file number for the data file involved in the wait.
P2—The block number within the data file referenced in P1 that is being waited upon.
P3—The reason code describing why the wait is occurring.

To find hot blocks:
Query-3:
select /*+ RULE */
   e.owner ||'.'|| e.segment_name segment_name,
   e.extent_id extent#,
   x.dbablk - e.block_id + 1 block#,
   x.tch,
   l.child#
from
   sys.v$latch_children l,
   sys.x$bh x,
   sys.dba_extents e
where
   x.hladdr = 'ADDR' and
   e.file_id = x.file# and
   x.hladdr = l.addr and
   x.dbablk between e.block_id and e.block_id + e.blocks -1
order by x.tch desc;

Note : If this query is not returning any row(s), then you don't have any hot blocks.

Most buffer cache waits can be fixed with additional freelists. But there are some limitations. If you observed very rarely then find what query is causing the issue. If required clear application sessions from the web/app layer for issuing user. I experienced with same.

About Oracle “Buffer Busy Waits” event:

This is the most common confounding wait event in Oracle. There are various kinds resolution methods for "buffer busy wait events". Buffer busy waits are common in an I/O-bound Oracle system, as evidenced by any system with read (sequential/scattered) waits in the top-five waits in the Oracle AWR report, like this:

See the below part of AWR report. It is clearly showing

(Figure-1)

The main way to reduce the total I/O on the system is to reduce buffer busy waits. This can be possible by tuning the SQL queris to access rows with fewer block reads (i.e., by adding indexes). Even if we have a huge db_cache_size, we may still see buffer busy waits, and increasing the buffer size won't help.

Reducing buffer busy waits reduces the total I/O on the system. This can be accomplished by tuning the SQL to access rows with fewer block reads by adding indexes, adjusting the database writer or adding freelists to tables and indexes.  But remember adjusting the database writer or adding freelists to tables and indexes may have some limitations. Even if there is a huge db_cache_size , the DBA may still see buffer busy waits and, in this case, increasing the buffer size will not help.

The most common remedies for high buffer busy waits include database writer (DBWR) contention tuning, adding freelists to a table and index, implementing Automatic Segment Storage Management (ASSM, a.k.a bitmap freelists), and, of course, and adding a missing index to reduce buffer touches.

In order to look at system-wide wait events, we can query the v$system_event performance view. This view, shown below, provides the name of the wait event, the total number of waits and timeouts, the total time waited, and the average wait time per event.

Run the Query-3 as shown above to find out the hot block.

The type of buffer that causes the wait can be queried using the v$waitstat view. This view lists the waits per buffer type for buffer busy waits, where COUNT is the sum of all waits for the class of block, and TIME is the sum of all wait times for that class:

select inst_id,EVENT,TOTAL_WAITS,TOTAL_TIMEOUTS,TIME_WAITED,AVERAGE_WAIT
from v$system_event a
where event in('buffer busy waits','free buffer waits');

output :

As for the workaround, the idea is to spread the hot blocks across multiple cache buffers chains latches. This can be done by relocating some of the rows in the hot blocks. The new blocks have different block addresses and, with any luck, they are hashed to buckets that are not covered by the same cache buffers chains latch. You can spread the blocks in a number of ways, including:


  • Deleting and reinserting some of the rows by ROWID.
  • Exporting the table, increasing the PCTFREE significantly, and importing the data. This minimizes the number of rows per block, spreading them over many blocks. Of course, this is at the expense of storage and full table scans operations will be slower.
  • Minimizing the number of records per block in the table. This involves dumping a few data blocks to get an idea of the current number of rows per block. Refer to the “Data Block Dump” section in Appendix C for the syntax. The “nrow” in the trace file shows the number of rows per block. Export and truncate the table. Manually insert the number of rows that you determined is appropriate and then issue the ALTER TABLE table_name MINIMIZE RECORDS_PER_BLOCK command. Truncate the table and import the data.
  • For indexes, you can rebuild them with higher PCTFREE values, bearing in mind that this may increase the height of the index.
  • Consider reducing the block size. You may move the table or recreate the index in a tablespace with an 8K block size. This too will negatively impact full table scans operations. Also, various block sizes increase management complexity.
  • For other workarounds, if the database is on Oracle9i Database Release 2 or higher, you may consider increasing the _SPIN_COUNT value as discussed earlier. As a last resort, you may increase the number of hash buckets through the _DB_BLOCK_HASH_BUCKETS parameter. This practice is rarely necessary starting in Oracle8i Database. If you do this, make sure you provide a prime number—if you don’t, Oracle will round it up to the next highest prime number
  • Finally, at last not in the list, re-organize tablespace in certain interval. i.e., when you have high water mark or high value for 'initial' extents for tables etc.


Please click here to gather more knowledge.


No comments:

Post a Comment

Translate >>