[翻译自mos文章]不完全恢复之后,open resetlogs之前,怎么快速的检查数据库是否处于一致性的状态?

Posted msdnchina

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了[翻译自mos文章]不完全恢复之后,open resetlogs之前,怎么快速的检查数据库是否处于一致性的状态?相关的知识,希望对你有一定的参考价值。

不完全恢复之后,open resetlogs之前,怎么快速的检查数据库是否处于一致性的状态?

翻译自:

How to quickly check that Database is consistent after incomplete recovery (Point in Time Recovery) before OPEN RESETLOGS (Doc ID 1354256.1)

适用于:
Oracle Database - Enterprise Edition - Version 9.0.1.0 and later
Information in this document applies to any platform.
***Checked for relevance on 05-Nov-2014***
目标:
当一个database从backup中restore之后,open之前,什么是需要做的最小数量的recovery?

在执行一个restore/recover之前,我们需要执行一个快速的validation(验证)来保证数据库是一致的以便open resetlogs.

这个主动检查帮助我们防止几个在open resetlogs之前或者之后可能会发生的问题.

本文假设你正在从一个有效的backup中restore

本文讨论的场景要比实际中要少,若是有疑问,请咨询Oracle support.

解决方案:
对于cold/offine backup(译者注:其实就是干净的关库,然后做的备份),不需要archivelog/recovery.你可以简单的open database with resetlogs.

但是对于hot/online backup来说,在database 被open之前,从备份开始到备份结束这个时间段内的所有的归档日志必须被applied(应用)---这就是需要的最少的recovery.

为了确定backup 完成时,哪个log 是current的,请留意database backup的completion time---从backup log中获得此值.

如果这是一个rman backup,你可以查询rman的元数据(metadata).在调用rman之前,请确保设置NLS_DATE_FORMAT环境变量,以便timestamps 和date被返回.

For Unix:
% export NLS_DATE_FORMAT=‘dd-mon-rr hh24:mi:ss‘
% rman target /

For Windows:
> set nls_date_format=dd-mon-rr:hh24:mi:ss                
> rman target /

找到备份的命令:
RMAN> LIST BACKUP OF DATABASE COMPLETED AFTER ‘<date>‘;
 or 
RMAN> LIST BACKUP OF DATABASE COMPLETED AFTER ‘sysdate -n‘;

设置<date> 来限制backup命令的输出,以便定位到你想要的backup的输出,注意完成时间,
对于一个multi-piece的backup,请注意最后一个被创建的backuppiece的完成时间.


在本文中,运行SQL查询语句时,你应该在session级别设置NLS_DATE_FORMAT,如下:
SQL> alter session set nls_date_format=‘DD-MON-YYYY HH24:MI:SS‘;

检查项 1:Checkpoint Time and Fuzziness

目标:验证被恢复(recovered)到想要的时间点(point in time--PIT)的datafiles,这些数据文件是一致的(FUZZY=NO)

通过从物理datafile中读取 datafile header来查询datafile的当前状态和PIT(Point In Time up to which the datafiles have been recovered)
SQL> select fuzzy, status, error, recover, checkpoint_change#, checkpoint_time, count(*) 
from v$datafile_header
group by fuzzy, status, error, recover, checkpoint_change#, checkpoint_time ;


FUZ STATUS  ERROR           REC CHECKPOINT_CHANGE# CHECKPOINT_TIME        COUNT(*)
--- ------- --------------- --- ------------------ -------------------- ----------
NO  ONLINE                                 5311260 31-AUG-2011 23:10:14          6
YES ONLINE                                 5311260 31-AUG-2011 23:10:14          1

a)验证 checkpoint_time/checkpoint_change# 符合你想要的 UNTIL TIME/SCN,如果不符合,继续recover database,若是你有更多的archived log的话

b)如果有些datafile的FUZZY=YES,这意味着需要更多的recovery.
  如果这些归档日志被丢失了,定位这些datafile并决定我们是否可以将这些datafile置于offline(译者注:当然可以用bbed进行修改,只不过风险自当,责任自负)
  警告:如果把datafile置为offline,我们会丢失这些datafile中的数据.
  
  如果这些datafile属于system or undo 表空间,没有合适的分析,我们决不能把这些文件置为offline状态.请联系Oracle Support以获取进一步的action.
SQL> select file#, substr(name, 1, 50), substr(tablespace_name, 1, 15), undo_opt_current_change# from v$datafile_header where fuzzy=‘YES‘ ;


     FILE# SUBSTR(NAME,1,50)                                  SUBSTR(TABLESPA UNDO_OPT_CURRENT_CHANGE#
---------- -------------------------------------------------- --------------- ------------------------
         3 /u01/app/oracle/oradata/prod111/undotbs01.dbf      UNDOTBS1                         5117431

偶尔(Occasionally),如果表空间名字不显示为UNDO 表空间,如果我们在UNDO_OPT_CURRENT_CHANGE#列上看到非零值,
这表示datafile中含有undo segments.


把datafile offline的方法:
SQL> alter database datafile <file#> offline ;




满足如下条件下,"检查项 1"可以视为通过:
a)验证所有的datafile 有相同的checkpoint_time,并且这也是你想要的Point in time.
b)对system,undo和你想要的datafiles来说,这些datafile的Fuzzy=NO.
  对于Fuzzy=Yes的datafile,要么recover他们要么offline掉他们(若是没有归档日志的话).


检查项 2
目标: 验证status=RECOVER 的datafiles 不是 无意之中 offline掉的.
SQL> select status, enabled, count(*) from v$datafile group by status, enabled ;

STATUS  ENABLED      COUNT(*)
------- ---------- ----------
SYSTEM  DISABLED            1
ONLINE  READ WRITE          4
RECOVER DISABLED            2

如果数据文件处于recover状态,确认他们是否被offline
SQL> select file#, substr(name, 1, 50), status, error, recover from v$datafile_header;

如果你想要这些数据文件里边的数据,那么你就需要online 这些datafile
SQL> alter database datafile <file#> ONLINE ;

满足如下条件下,"检查项 2"可以视为通过:
所有需要的datafile 不能是offline状态.




检查项 3:绝对的Fuzzy
目标:额外的Fuzzy check(绝对的Fuzzy check)


偶尔(Occasionally),存在如下可能:
所有需要的datafile 的Fuzzy=no 并且checkpoint_change#都相同,但是open resetlogs还是失败了.


举例:
SQL> select fuzzy, status, error, recover, checkpoint_change#, checkpoint_time, count(*) from v$datafile_header group by fuzzy, status, error, recover, checkpoint_change#, checkpoint_time ;


FUZ STATUS  ERROR           REC CHECKPOINT_CHANGE#      CHECKPOINT_TIME   COUNT(*)
--- ------- --------------- --- ------------------ -------------------- ----------
NO  ONLINE                                 5311260 31-AUG-2011 23:10:14          7



SQL> ALTER DATABASE OPEN RESETLOGS ;


ORA-01194: file 4 needs more recovery to be consistent
ORA-01110: data file 3: ‘/u01/app/oracle/oradata/prod111/undotbs02.dbf‘



因此,我们需要执行额外的fuzzy check ---这称之为Absolute Fuzzy Check(绝对的Fuzzy Check)


SQL> select hxfil file#, substr(hxfnm, 1, 50) name, fhscn checkpoint_change#, fhafs Absolute_Fuzzy_SCN, max(fhafs) over () Min_PIT_SCN from x$kcvfh where fhafs!=0 ;


FILE#      NAME                                               CHECKPOINT_CHANG ABSOLUTE_FUZZY_S     MIN_PIT_SCN
---------- -------------------------------------------------- ---------------- ---------------- ----------------
         4 /u01/app/oracle/oradata/prod111/undotbs01.dbf               5311260          5311524          5311524
         6 /u01/app/oracle/oradata/prod111/system01.dbf                5311260          5311379          5311524

注意:
Column Min_PIT_SCN will return same value even for multiple rows as we have applied ANALYTICAL "MAX() OVER ()" function on it.

以上查询显示:recovery必须被执行,最少到UNTIL SCN 5311524,以保证datafile处于一致性状态并准备被open.
因为checkpoint_change#小于Min_PIT_SCN,所以datafile需要被更多的recovery.


满足如下条件下,"检查项 3"可以视为通过:
a)以上查询返回零行记录(即:所有的datafile的Min_PIT_SCN 是零)
b)Min_PIT_SCN 小于Checkpoint_Change#.


检查项 4:所需要的归档日志.


查询控制文件以找到需要恢复的最新得归档日志.Lets say the backup completed at  31-AUG-2011 23:20:14:
SQL> -- V$ARCHIVED_LOG
SQL> --
SQL> ALTER SESSION SET NLS_DATE_FORMAT=‘DD-MON-RR HH24:MI:SS‘;
SQL> SELECT THREAD#, SEQUENCE#, FIRST_TIME, NEXT_TIME FROM V$ARCHIVED_LOG 
      WHERE ‘31-AUG-11 23:20:14‘ BETWEEN FIRST_TIME AND NEXT_TIME;

若上面的查询返回零行记录,那有可能是这些信息已经不在控制文件中了,请运行下面的查询(针对v$log_history):
SQL> -- V$LOG_HISTORY  view does not have a column NEXT_TIME
SQL> --
SQL> ALTER SESSION SET NLS_DATE_FORMAT=‘DD-MON-RR HH24:MI:SS‘;
SQL> select a.THREAD#, a.SEQUENCE#, a.FIRST_TIME 
       from V$LOG_HISTORY a 
      where FIRST_TIME = 
         ( SELECT MAX(b.FIRST_TIME) 
             FROM V$LOG_HISTORY b
            WHERE b.FIRST_TIME < to_date(‘31-AUG-11 23:20:14‘, ‘DD-MON-RR HH24:MI:SS‘) 
         ) ;
SQL>

上面的查询语句反馈的sequence#就是 backup 完成时的log sequence current.---假设是530 thread 1.
对于最小恢复,请使用(如上返回的Sequence# +1):
RMAN> RUN 
{ 
 SET UNTIL SEQUENCE 531 THREAD 1;
 RECOVER DATABASE;
}

若是rac环境,运行下面的查询:
SQL> SELECT THREAD#, SEQUENCE#, FIRST_CHANGE#, NEXT_CHANGE# 
FROM V$ARCHIVED_LOG 
WHERE ‘31-AUG-11 23:20:14‘ BETWEEN FIRST_TIME AND NEXT_TIME;

---关键点:
For minimum recovery use the log sequence and thread that has the lowest NEXT_CHANGE# returned by the above query.


满足如下条件下,"检查项 4"可以视为通过:
在恢复过程中,从开始备份前到备份结束时的所有的归档日志都是可用的.




检查项 5:open resetlogs之后要做的:
在open resetlogs过程中,监控alert.log中的额外错误/信息.在数据字典检查(dictionary check)过程中,你可能看到类似于如下的信息:
Dictionary check beginning
Tablespace ‘TEMP‘ #3 found in data dictionary, <(============================== (1)
but not in the controlfile. Adding to controlfile.
Tablespace ‘USERS‘ #4 found in data dictionary, 
but not in the controlfile. Adding to controlfile.
File #4 found in data dictionary but not in controlfile.
Creating OFFLINE file ‘MISSING00004‘ in the controlfile. <(==================== (2)
File #5 is online, but is part of an offline tablespace. <(==================== (3)
data file 5: ‘/u01/app/oracle/oradata/prod111/example01.dbf‘
File #7 found in data dictionary but not in controlfile. <(==================== (2)
Creating OFFLINE file ‘MISSING00007‘ in the controlfile.
File #8 is offline, but is part of an online tablespace. <(==================== (4)
data file 8: ‘/u01/app/oracle/oradata/prod111/mydata02.dbf‘
File #9 is online, but is part of an offline tablespace. <(==================== (3)
data file 9: ‘/u01/app/oracle/oradata/prod111/example02.dbf‘
Dictionary check complete

我们来讨论一下上面标注出来的内容:

(1) Check if the temp files exist. If not, add them as per your preference: 

SQL> select file#, name from v$tempfile ;

no rows selected

SQL> select file#, name from dba_temp_files ;

no rows selected


SQL> select tablespace_name, status, contents from dba_tablespaces where contents=‘TEMPORARY‘ ;

TABLESPACE_NAME                STATUS    CONTENTS
------------------------------ --------- ---------
TEMP                           ONLINE    TEMPORARY


SQL> alter tablespace temp add tempfile ‘/u01/app/oracle/oradata/temp01.dbf‘ size 10m ;

Tablespace altered.

SQL> select file#, substr(name, 1, 50), status, enabled from v$tempfile

FILE#    SUBSTR(NAME,1,50)                                  STATUS  ENABLED
-------- -------------------------------------------------- ------- ----------
       1 /u01/app/oracle/oradata/temp01.dbf                 ONLINE  READ WRITE


(2) It appears that the tablespace was brought offline using "ALTER TABLESPACE USERS OFFLINE" command. So, verify if the missing files really exist with original name. You may need to consult your pear DBAs, or refer alert.log / RMAN backup log or any such information which may provide clue about the actual file name. 

If you find the file, try to rename them. If not, we can offline the datafile or drop associated tablespace:

SQL> select file#, status, enabled, substr(name, 1, 50) from v$datafile where name like ‘%MISSING%‘ ;

FILE#    STATUS  ENABLED    SUBSTR(NAME,1,50)
-------- ------- ---------- --------------------------------------------------
       4 OFFLINE DISABLED   /u01/app/oracle/product/11.1.0/db_1/dbs/MISSING000
       7 OFFLINE DISABLED   /u01/app/oracle/product/11.1.0/db_1/dbs/MISSING000


SQL> alter database datafile 4 online ;
alter database datafile 4 online
*
ERROR at line 1:
ORA-01157: cannot identify/lock data file 4 - see DBWR trace file
ORA-01111: name for data file 4 is unknown - rename to correct file
ORA-01110: data file 4: ‘/u01/app/oracle/product/11.1.0/db_1/dbs/MISSING00004‘


SQL> alter database rename file ‘MISSING00004‘ to ‘/u01/app/oracle/oradata/prod111/users01.dbf‘ ;

Database altered.

SQL> alter database rename file ‘MISSING00007‘ to ‘/u01/app/oracle/oradata/prod111/users02.dbf‘ ;

Database altered.

SQL> select tablespace_name, status from dba_tablespaces where tablespace_name in (select tablespace_name from dba_data_files where file_id in (4, 7)) ;

TABLESPACE_NAME                STATUS
------------------------------ ---------
USERS                          OFFLINE

SQL> ALTER TABLESPACE USERS ONLINE ;

Tablespace altered.



Before proceeding, let‘s query the status for these files in alert.log:

SQL> select a.file#, substr(a.name, 1, 50) file_name, a.status file_status, a.error, substr(a.tablespace_name, 1, 10) tablespace_name, b.status tablespace_status from v$datafile_header a, dba_tablespaces b
where a.tablespace_name=b.tablespace_name /* and a.file# in (4, 5, 7, 8, 9) */ ;

FILE# FILE_NAME                                     FILE_STATUS ERROR           TABLESPA TABLESPACE_STATUS
----- --------------------------------------------- ----------- --------------- -------- ------------------
    1 /u01/app/oracle/oradata/prod111/system01.dbf  ONLINE                      SYSTEM   ONLINE
    2 /u01/app/oracle/oradata/prod111/sysaux01.dbf  ONLINE                      SYSAUX   ONLINE
    3 /u01/app/oracle/oradata/prod111/undotbs01.dbf ONLINE                      UNDOTBS1 ONLINE
    4 /u01/app/oracle/oradata/prod111/users01.dbf   OFFLINE     OFFLINE NORMAL  USERS    OFFLINE <(== related to (2) in alert.log excerpt above
    5 /u01/app/oracle/oradata/prod111/example01.dbf ONLINE                      EXAMPLE  OFFLINE <(== related to (3) in alert.log excerpt above 
    6 /u01/app/oracle/oradata/prod111/mydata01.dbf  ONLINE                      MYDATA   ONLINE 
    7 /u01/app/oracle/oradata/prod111/users02.dbf   OFFLINE     OFFLINE NORMAL  USERS    OFFLINE <(== related to (2) in alert.log excerpt above 
    8 /u01/app/oracle/oradata/prod111/mydata02.dbf  OFFLINE     WRONG RESETLOGS MYDATA   ONLINE <(=== related to (4) in alert.log excerpt above 
    9 /u01/app/oracle/oradata/prod111/example02.dbf ONLINE                      EXAMPLE  OFFLINE <(== related to (3) in alert.log excerpt above

9 rows selected.


So, we can attempt to correct the "ERROR" as displayed in above query depending on the availability of file / archived logs and other possible factors. 

Let‘s continue,

(3) It seems that tablespace was brought offline inconsistently ( ALTER TABLESPACE EXAMPLE OFFLINE IMMEDIATE ). If the archived log generated at that time has got applied, the file may be back online : 
 

SQL> alter tablespace example ONLINE ;

Tablespace altered.

 

(4) This tablespace MYDATA has 2 datafiles File# 6 & 8. It appears that File# 8 was brought offline ( using ALTER DATABASE DATAFILE 8 OFFLINE ) and it was OFFLINE before OPEN RESETLOGS. If the archived log generated at that time has got applied during recovery or all the archived logs are available for recovery since that time, the file may be back online :

SQL> alter database datafile 8 online ;
alter database datafile 8 online
*
ERROR at line 1:
ORA-01190: control file or data file 8 is from before the last RESETLOGS
ORA-01110: data file 8: ‘/u01/app/oracle/oradata/prod111/mydata02.dbf‘


SQL> alter tablespace mydata online ;
alter tablespace mydata online
*
ERROR at line 1:
ORA-01190: control file or data file 8 is from before the last RESETLOGS
ORA-01110: data file 8: ‘/u01/app/oracle/oradata/prod111/mydata02.dbf‘


SQL> recover datafile 8 ;
Media recovery complete.
SQL> alter database datafile 8 online ;

Database altered.

SQL> alter tablespace mydata online ;

Tablespace altered.


Please note that it is not always possible to recover and bring the file online which is failing with error " ORA-01190: control file or data file x is from before the last RESETLOGS".


(5) There can be a scenario where the tablespace was in READ ONLY mode before OPEN RESETLOGS. Please check below Article on that:


Note 266991.1 Recovering READONLY tablespace backups made before a RESETLOGS Open

REFERENCES

NOTE:266991.1 - Recovering READONLY tablespace backups made before a RESETLOGS Open
NOTE:238422.1 - RMAN recover database fails RMAN-6025 - v$archived_log.next_change# is 281474976710655



以上是关于[翻译自mos文章]不完全恢复之后,open resetlogs之前,怎么快速的检查数据库是否处于一致性的状态?的主要内容,如果未能解决你的问题,请参考以下文章

翻译自mos文章执行utlpwdmg.sql之后报ORA-28003, ORA-20001, ORA-20002, ORA-20003, ORA-20004 错误

翻译自mos文章 在错误的从os级别remove掉 trace file 之后,怎么找到该trace file的内容?

翻译自mos文章当/var/tmp文件夹被remove掉之后,GI crash,并启动失败,原因是ohasd can not create named pipe

翻译自mos文章/dev/shm应该设置多大Oracle 实例启动时才能不报ORA-00845

翻译自mos文章job 不能自己主动执行--这是另外一个mos文章,本文章有13个解决方法

翻译自mos文章多租户中的service管理