Log gap reports error while accessing datafiles in 10.2.0.4 ASM based datafiles. The below article explains the reason and solution
Problem Details
Problem Statement
Cannot identify - lock data file in 10.2.0.4 ASM
Applicable Error Code
ORA-00283: recovery session canceled due to errors ORA-01110: data file 127: DG2-prd-datafile-oradb1.317.809077757 ORA-01157: can not Identify - lock data file 127 - see DBWR trace file
Affected Versions
Database Version 10.2.0.4, Standby Version 6.0 and 7.0
Affected Platforms
Platform independent
This was the email received with errors, stating the error is an ORA-01157:
ORA-00283: recovery session canceled due to errors ORA-01110: data file 127: DG2-prd-datafile-oradb1.317.809077757 ORA-01157: can not Identify - lock data file 127 - see DBWR trace file
But this file is there on ASM:
ASMCMD> ls-la DG2-prd-datafile-oradb1.317.809077757
Type Redund Striped Time Sys Name
UNPROT DATAFILE COARSE MAR 04 01:00:00 Y none => oradb1.317.809077757
This error has occurred several times, but with different files each time. This is very strange, as there is no ORA-01157 errors in the alert.log
instance of standby database. Nor are there traces of DBWR in dump directories:
[oracle@my-db-oraprod01 dba]$ fgrep ORA-01157 -u02-oracle-dump-bdump-alert_DBVPRD1.log
[oracle@my-db-oraprod01 dba]$ find -u02-oracle-dump -name *dbwr*
[oracle@my-db-oraprod01 dba]$
We built a 10.2.0.4 RAC environment where ASM and the Database are using 10.2.0.4 and were able to reproduce the error while running Dbvisit Standby as well as using manual recovery outside the use of the application. It does appear that during the recover command the standby database could possibly for brief moment not see the datafiles, almost as if there is some locking issue happening when v$database
, v$datafile
and v$datafile_header
is accessed by multiple sessions at the same time.
While we run a recovery session on the standby, we execute the log gap report from both primary nodes, which is running queries against v$database
, v$datafile
and v$datafile_header
, v$log_history
, and were able to get similar ORA-01157/ORA-01110 errors (as well as the cannot obtain next change for standby database, which is an after effect in Dbvisit Standby due to the ORA1157/1110 errors). We were also able to simulate this a couple times running manual queries, but it does not seem to be 100% consistent. It is quite random and seems to be specific to 10.2.0.4 as we do not get this on later versions.
Searching on Oracle support there are references of bugs filed against 10.2.0.4 for ORA-1157 and ORA-1110, which is what we are seeing in this case, but they are not published (example Bug.8220399/6706162 (36) QUERY FROM DBA_DATA_FIELS FAILES WITH ORA-1157,ORA-1110).
Then we ran further tests:
- Upgraded the ASM on the standby side to 10.2.0.5 and ran the same test scenarios.
- Generate a large amount of redo on primary nodes, then kick off an apply process.
- While apply process is running, kick off two log gap reports from the primary nodes.
- Error was produced, same as before ORA-1157 and ORA-1110.
- Created a backup of the whole RAC and Standby environment, followed by an upgrade of the Database software to 10.2.0.5 on both Primary nodes as well as Standby (ASM and DB both running 10.2.0.5) before running the tests again...
- Generated large amount of redo, start apply while running many log gap reports from primary nodes.
- NO Errors were reported!
Patching the database software to 10.2.0.5 seemed to fix this issue, which brings us to the conclusion this this is an Oracle bug that is fixed in the 10.2.0.5 patch set.
So, the resolution for this issue - and the next action to take (and this is a resolution verified by the customer in this case) is to apply patch 10.2.0.5.
Mike Donovan July 02, 2013 10:50
Comments