Aug 25, 2015

Fix block corruption via RMAN - Pains & Gains of NOLOGGING mode in Oracle database

Fix block corruptions via RMAN using Oracle 11gR2
Pains & Gains of NOLOGGING mode in Oracle database

a) About NOLOGGING
b) FORCE LOGGING Options
c) Gains of NO LOGGING
d) Detection of Nologging Operations On the Primary and Standby Databases
e) An workaround to fix block corruption via RMAN

f) Best practice to avoid block corruptions
------------------------------------------------------------------------


a) About NoLOGGING Operation:


The FORCE LOGGING option is the safest method to ensure that all the changes made in the database will be captured and available for recovery in the redo logs. Force logging is the new feature added to the family of logging attributes.

Before the existence of FORCE LOGGING, Oracle provided logging and nologging options. These two options have higher precedence at the schema object level than the tablespace level; therefore, it was possible to override the logging settings at the tablespace level with nologging setting at schema object level.

The FORCE LOGGING option can be set at the database level or the tablespace level. The precedence is from database to tablespace. If a tablespace is created or altered to have FORCE LOGGING enabled, any change in that tablespace will go into the redo log and be usable for recovery.

Similarly, if a database is created or altered to have the FORCE LOGGING enabled, any change across the database, with exception of temporary segments and temporary tablespace, will be available in redo logs for recovery. The FORCE LOGGING option can be set at database creation time or later using the alter database command.

To enable FORCE LOGGING after the database is created, use the following command:

ALTER DATABASE FORCE LOGGING;

b) FORCE LOGGING Options:

1) Database level:
  -- Enable
SQL> ALTER DATABASE FORCE LOGGING;
  -- Disable
SQL> ALTER DATABASE NO FORCE LOGGING;


b) Tablespace Level:
-- Disable
SQL> ALTER TABLESPACE <tablespace_name> NO FORCE LOGGING;
-- Enable
SQL> ALTER TABLESPACE <tablespace name> FORCE LOGGING;

c) Table level:
-- Disable
SQL> ALTER TABLE <table_name> NOLOGGING;
-- Enable
SQL> ALTER TABLE <table_name> NOLOGGING;
Note:

Temporary tablespaces and temporary segments have no effect during FORCE LOGGING mode because these objects do not generate any redo. Undo tablespaces are in FORCE LOGGING mode by default, so they cannot be put into FORCE LOGGING mode. Oracle will generate an error if an attempt is made to put a temporary tablespace or undo tablespace into FORCE LOGGING mode.

The FORCE_LOGGING column of v$database view can be queried to verify that the database is in FORCE LOGGING mode. Similarly, the FORCE_LOGGING column of dba_tablespaces view provides the same logging information for each tablespace.

select force_logging from v$database;
select force_logging from dba_tablespaces;



Warning !!!

Putting a database in FORCE LOGGING mode will have some performance impact.


c) Gains of NO LOGGING:

Nologging operations indicate that the database operation is not logged in the online redo log file. Even though a small invalidation redo record is still written to the online redo log file, nologging operations skip the redo generation of the corresponding DML data.  Nologging can be extremely beneficial for the following reasons:
  • data written to the redo is minimized dramatically
  • time to insert into a large table or index or LOB can be reduced dramatically
  • performance improves for parallel creation of large tables or indices
However, NOLOGGING is intended for configurations in which media recovery or the recovery of the corresponding object is not important. Thus, if the disk or tape or storage media fails, you will not be able to recover your changes from the redo because the changes were never logged. 

d) Detection of Nologging Operations On the Primary and Standby Databases

On the primary database, you can monitor for the most recent nologging operation that occurred in the database by issuing the following query:

         SELECT NAME, UNRECOVERABLE_CHANGE#,               
         TO_CHAR (UNRECOVERABLE_TIME,'DD-MON-YYYY HH:MI:SS')
         FROM V$DATAFILE;


The above primary database's query dictates when the most recent nologging operation occurred and when the invalidation redo was written to the redo.  Once Redo Apply (or Media Recovery) processes the invalidation redo, it marks all the corresponding data blocks corrupt.  You will detect encounter corrupted blocks on the physical standby database when you query any data that references these data blocks.   You will receive the following errors: 

                ORA-01578: ORACLE data block corrupted (file # 3, block # 514)
                ORA-01110: data file 3: '/oracle/oradata/PROD/users.dbf'
                ORA-26040: Data block was loaded using the NOLOGGING option


You can proactively catch some of these corrupted blocks on Redo Apply (or media recovery) instance by running DBVERIFY on the data files.

Example :
           
$ cd /oracle/oradata/PROD
$ dbv file=users01.dbf

DBVERIFY: Release 11.2.0.4.0 - Production on Tue Aug 25 16:41:53 2015

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

DBVERIFY - Verification starting : FILE = /oracle/oradata/PROD/users01.dbf

DBV-00201: Block, DBA 17162242, marked corrupt for invalid redo application
DBV-00201: Block, DBA 17162244, marked corrupt for invalid redo application
...
...
DBV-00201: Block, DBA 17162493, marked corrupt for invalid redo application


DBVERIFY - Verification complete

Total Pages Examined         : 407680
Total Pages Processed (Data) : 221830
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 1257
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 175354
Total Pages Processed (Seg)  : 0
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 9239
Total Pages Marked Corrupt   : 111
Total Pages Influx           : 0
Total Pages Encrypted        : 0
Highest block SCN            : 2355429844 (29.2355429844)
 

$ SQL apply ignores the invalidation redo since it cannot convert it to any reasonable SQL; so, the logical standby will not receive any immediate errors.   If future transactions reference the missing data, then apply slave will receive an ORA-01403 in the alert.log.  

For example, the following UPDATE statement failed on the logical standby because it was referencing  'nologged' rows that do not exist on the logical standby database.

-- Repair of Nologged Changes on the Physical and Logical Standby Databases:

After a nologged operation on the primary is detected, it is recommended to create a backup immediately if you want to recover from this operation in the future.  However there are additional steps required if you have an existing physical or logical standby database.    This is crucial if you want to preserve the data integrity of your standby databases.

For a physical standby database, Redo Apply will process the invalidation redo and mark the corresponding data blocks corrupt.

For a physical standby database, follow these steps to reinstantiate the relevant data files .

    1. stop Redo Apply (recover managed standby database cancel)
    2. offline corresponding datafile(s) (alter database datafile <NAME> offline drop;)
    3. start Redo Apply (recover managed standby database disconnect)
    4. copy the appropriate backup datafiles over from the primary database (e.g. use RMAN to backup datafiles and copy them)
    5. stop Redo Apply (recover managed standby database cancel)
    6. online corresponding data files (alter database datafile <NAME> online;)
    7. start Redo Apply (recover managed standby database disconnect)

======================
Solution with an workaround:
======================

As per above section, we can use this procedure if RMAN backup has not failed while reading the corrupt block. To determine the same, run a backup on the datafile having the corrupt block:

RMAN> backup check logical datafile 7 format '/u03/backup/%U' tag 'CORRUPT_BLK_FILE_BKP';

Starting backup at 24-AUG-15
using channel ORA_DISK_1
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00007 name=/oracle/oradata/PROD/demo01.dbf
channel ORA_DISK_1: starting piece 1 at 24-AUG-15
channel ORA_DISK_1: finished piece 1 at 24-AUG-15
piece handle=/u03/backup/1jnbhl5c_1_1 tag=CORRUPT_BLK_FILE_BKP comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
Finished backup at 24-AUG-15


Ensure that the location where backup-piece will be created has sufficient space. You can change this destination using the FORMAT clause. Also, even though the backup seems successful, crosscheck using
below query that the corrupt block is not contained in the backup:

SQL> select BP.HANDLE, BP.COMPLETION_TIME, BC.FILE#, BC.BLOCK#, BC.BLOCKS, BC.MARKED_CORRUPT, BC.CORRUPTION_TYPE
    from V$BACKUP_PIECE BP, V$BACKUP_CORRUPTION BC
    where BP.SET_COUNT = BC.SET_COUNT and
          BP.SET_STAMP = BC.SET_STAMP and
          BP.TAG = 'CORRUPT_BLK_FILE_BKP';


no rows selected.

If the above query returns rows showing the corrupt block, we cannot use this procedure. In above case, since it has not returned rows, we can be sure that RMAN has skipped the corrupt block due to unused block optimization algorithm described above. Now, if the datafile is restored from this backup, RMAN will format and restore a empty copy of the corrupt block which can then be used for rman block recovery as below.

1. Restore the datafile to alternate location:

RMAN> run {
2> set newname for datafile 7 to '/oracle/oradata/PROD/demo01_RESTORED.dbf';
3> restore datafile 7 from tag 'CORRUPT_BLK_FILE_BKP';
4> }

executing command: SET NEWNAME

Starting restore at 24-AUG-15
using channel ORA_DISK_1

channel ORA_DISK_1: starting datafile backup set restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_DISK_1: restoring datafile 00007 to /oracle/oradata/PROD/demo01_RESTORED.dbf
channel ORA_DISK_1: reading from backup piece /u03/backup/1jnbhl5c_1_1
channel ORA_DISK_1: piece handle=/u03/backup/1jnbhl5c_1_1 tag=CORRUPT_BLK_FILE_BKP
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:00:03
Finished restore at 24-AUG-15


2. Run DBV on the restored datafile to verify it is free from corruption:

$ dbv file=/oracle/oradata/PROD/demo01_RESTORED.dbf blocksize=8192


DBVERIFY: Release 11.2.0.4.0 - Production on Tue Aug 25 17:01:13 2015

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

DBVERIFY - Verification starting : FILE = /oracle/oradata/PROD/demo01_RESTORED.dbf

DBVERIFY - Verification complete

Total Pages Examined         : 12800
Total Pages Processed (Data) : 0
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 0
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 12799
Total Pages Processed (Seg)  : 0
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 1
Total Pages Marked Corrupt   : 0
Total Pages Influx           : 0
Total Pages Encrypted        : 0
Highest block SCN            : 775154 (0.775154)


3. Run BLOCKRECOVER command as below to repair the corrupt block.

The corrupt block will be replaced by the empty, formatted block from the restored datafile:

RMAN> blockrecover datafile 7 block 150 FROM DATAFILECOPY;

Starting recover at 24-AUG-15
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=22 device type=DISK

channel ORA_DISK_1: restoring block(s) from datafile copy /oracle/oradata/PROD/demo01_RESTORED.dbf

starting media recovery
media recovery complete, elapsed time: 00:00:01


Finished recover at 24-AUG-154. 

Run DBV on the original file to confirm that it is now free from corruption:


$ dbv file=/oracle/oradata/PROD/demo01.dbf blocksize=8192

DBVERIFY: Release 11.2.0.4.0 - Production on Tue Aug 25 17:11:33 2015

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

DBVERIFY - Verification starting : FILE = /oracle/oradata/PROD/demo01.dbf

DBVERIFY - Verification complete

Total Pages Examined         : 12800
Total Pages Processed (Data) : 356
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 0
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 152
Total Pages Processed (Seg)  : 0
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 12292
Total Pages Marked Corrupt   : 0
Total Pages Influx           : 0
Total Pages Encrypted        : 0
Highest block SCN            : 775154 (0.775154)


5. Additionally, you can run VALIDATE in RMAN to verify further:

RMAN> backup validate check logical datafile 7;

Starting backup at 24-AUG-15
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=22 device type=DISK
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00007 name=/oracle/oradata/PROD/demo01.dbf
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:03
List of Datafiles
=================
File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
7    OK     0              12292        12801           775154
  File Name: /oracle/oradata/PROD/demo01.dbf
  Block Type Blocks Failing Blocks Processed
  ---------- -------------- ----------------
  Data       0              356
  Index      0              0
  Other      0              152

Finished backup at 24-AUG-15


f) Best practices to avoid block corruptions:

Regular and proactive checking of the database for corruptions:

1. Run the DBVerify utility against the datafile at a periodic intervals, to check for any physical corruption.

e.g.,
dbv file=system01.dbf blocksize=8192

2.  Run object level analyze command at a periodic intervals. This will check for logical inconsistencies and even detect physical corruptions. It is important to note that this command locks the object, which is  being  analyzed so need to be performed at off-peak hours. The online option available with analyze helps
in removing this restriction to some extent.( IN OFF PAEK HOURS only)

e.g.,
SQL> Analyze table <user>.<table_name> validate structure cascade [online];
SQL> Analyze index <user>.<index_name/cluster_name> validate structure;
     
   For partition table analyze, the utlvalid.sql script must be run to create invalid_rows table If not run you need to run the below analyze command instead.

SQL> Analyze table <user>.<table_name> partition <partition_name> validate structure cascade into invalid_rows;

3. Always take full database export backup ( use EXPDP/ EXP)

e.g.,
expdp directory=dir_name dumpfile=dump_name.dmp logfile=log_name.log full=y

Note : As per requirement you can add many more parameters. Use expdp -help to see details.

Click here to read more about expdp

4. Ensuring the data free of corruptions when doing a bulk load. After the dataload, perform normal
   validations(select/analyze/export) to detect corruptions, if introduced while loading.

5. Use RMAN to check the existence of physical and logical corruption.

e.g.to validate the complete database

RMAN> BACKUP CHECK LOGICAL VALIDATE DATABASE;

Note: See the above document to know more about RMAN validation.

6. Use DBMS_REPAIR package to verify corruption in an object and can use this package to mark the block as soft corrupt.

DBMS_REPAIR.CHECK_OBJECT :CHECK_OBJECT procedure checks the specified object and populates the repair
table with information about corruption and repair directive(s). Validation consists of block checking all blocks in the object. All blocks previously marked corrupt will be skipped.

Click here to read more...

7. Run Memory/Hardware diagnostics periodically.

8. Set block checking parameters.

DB_BLOCK_CHECKING = TRUE (FULL from 10.2 onwards)
DB_BLOCK_CHECKSUM = TRUE (FULL from 10.2 onwards)
_DB_BLOCK_CHECK_FOR_DEBUG = TRUE

Note : Enable these parameter has performance impact. Don't go above parameters unless untill load test clarification. See more documents in Oracle site.

11g specific:

* DB_ULTRA_SAFE = { OFF | DATA_ONLY | DATA_AND_INDEX }

Click here to read "Best Practices for Avoiding and Detecting Corruption" (Doc ID 428570.1)

1 comment:

  1. This comment has been removed by a blog administrator.

    ReplyDelete

Translate >>