###########issue 0: db alert 有如下提示, thread 1 cannot allocatete new log, sequenec 1111
通过检查v$log ,发现10组日志,每组1G, 一个小时产生10G 日志,应该是够用的。
怀疑是没有提交的巨量事物导致这个报错:
检查
SELECT a.used_ublk
FROM v$transaction a, v$session b
WHERE a.addr = b.taddr
####issue 1 检查回滚的时间,使用alter system kill session 方法,观察方法如下:
pmon Monitor the Rollback progress ,after alter system kill session "pid,serial", the ospid is still exsist after kill
after kill , the ospid 14001 still exsit;
lsof -p 14001 can check process
Problem Description
-------------------
You did not commit your transactions and the session were accidentally
killed. Your transactions are rolling back and it is taking a long time.
Rollback started hours ago and is still in progress.
You want to know if there is any way to speed up the process such as using
cleanup_rollback_entries in the init.ora and then restarting the database.
You also want to know what will happen if you shutdown the database after 12
hours of rollback. Will the rollback pick up where it left off?
SQL> SELECT a.used_ublk
FROM v$transaction a, v$session b
WHERE a.addr = b.taddr AND b.sid = 206;
For example:
If used_ublk showed 29,900 12 hours ago and is now 22,900, it has
taken 12 hours to rollback 7,000 entries. It will take approximately
another 36 hours to complete depending on the types of transactions
that are rolling back.
SELECT USED_UBLK FROM V$TRANSACTION;
select ses.username,ses.sid,substr(ses.program, 1, 19) command,tra.used_ublk from v$session ses, v$transaction tra where ses.saddr = tra.ses_addr;
select
s.username,
t.xidusn,
t.xidslot,
t.xidsqn,
x.ktuxesiz
from
sys.x$ktuxe x,
sys.v_$transaction t,
sys.v_$session s
where
x.inst_id = userenv(‘Instance‘) and
x.ktuxesta = ‘ACTIVE‘ and
x.ktuxesiz > 1 and
t.xidusn = x.ktuxeusn and
t.xidslot = x.ktuxeslt and
t.xidsqn = x.ktuxesqn and
s.saddr = t.ses_addr;
或者使用自动化脚本(取自网络)
set serveroutput on
declare
cursor tx is
select
s.username,
t.xidusn,
t.xidslot,
t.xidsqn,
x.ktuxesiz
from
sys.x$ktuxe x,
sys.v_$transaction t,
sys.v_$session s
where
x.inst_id = userenv(‘Instance‘) and
x.ktuxesta = ‘ACTIVE‘ and
x.ktuxesiz > 1 and
t.xidusn = x.ktuxeusn and
t.xidslot = x.ktuxeslt and
t.xidsqn = x.ktuxesqn and
s.saddr = t.ses_addr;
user_name varchar2(30);
xid_usn number;
xid_slot number;
xid_sqn number;
used_ublk1 number;
used_ublk2 number;
begin
open tx;
loop
fetch tx into user_name, xid_usn, xid_slot, xid_sqn, used_ublk1;
exit when tx%notfound;
if tx%rowcount = 1
then
sys.dbms_lock.sleep(10);
end if;
select
sum(ktuxesiz)
into
used_ublk2
from
sys.x$ktuxe
where
inst_id = userenv(‘Instance‘) and
ktuxeusn = xid_usn and
ktuxeslt = xid_slot and
ktuxesqn = xid_sqn and
ktuxesta = ‘ACTIVE‘;
if used_ublk2 < used_ublk1
then
sys.dbms_output.put_line(
user_name ||
‘‘‘s transaction ‘ ||
xid_usn || ‘.‘ ||
xid_slot || ‘.‘ ||
xid_sqn ||
‘ will finish rolling back at approximately ‘ ||
to_char(
sysdate + used_ublk2 / (used_ublk1 - used_ublk2) / 6 / 60 / 24,
‘HH24:MI:SS DD-MON-YYYY‘
)
);
end if;
end loop;
if user_name is null
then
sys.dbms_output.put_line(‘No transactions appear to be rolling back.‘);
end if;
end;
/db/aa/oradata对应的是不是这块盘 VxVM27001
Solution Description
--------------------
没有办法加快rollback 的速度。
There is no way to speed up the rollback process and there is no formula for
determining how long it will take to complete. It depends on what type of
undo the application has generated. Some undo may take little space in an
undo block, but may take awhile to apply.
You can look at used_ublk in V$transaction to estimate how long it is going
to take to complete the rollback.
SQL> SELECT a.used_ublk
FROM v$transaction a, v$session b
WHERE a.addr = b.taddr AND b.sid = <SID>;
For example:
If used_ublk showed 29,900 12 hours ago and is now 22,900, it has
taken 12 hours to rollback 7,000 entries. It will take approximately
another 36 hours to complete depending on the types of transactions
that are rolling back.
CLEANUP_ROLLBACK_ENTRIES determines how long SMON will be holding onto one
transaction‘s resources. It only affects recovery of transactions in the
background such as after an instance crash. It doesn‘t affect rollback
by the transaction itself.
Rollback will pick up where it left off if you do shutdown after 12 hours
of rollback.
Solution Explanation
--------------------
You can use V$transaction used_ublk to estimate how long the rollback is
going to take but there is no formula for this. If you shutdown the
database after rollback has started, it will begin where it left off.
For Oracle 9i and onwards ,check :
SQL> SELECT DISTINCT ktuxesiz
FROM x$ktuxe
WHERE ktuxecfl=‘DEAD‘;
########### 检查回滚的时间,kill -9 的方法杀进程,观察方法如下:
issue 2 smon Transaction recovery by SMON , the ospid is not exsist.
###monitor
SMON process takes over the recovery when
->Server process is dead / crashed. the ospid is not exsist.
->Instance itself is crashed
3. Speed up SMON transaction recovery
SMON transaction recovery can be controlled using the FAST_START_PARALLEL_ROLLBACK parameter
a. Parallel Transaction Recovery:
To enable Parallel recovery mode, set the parameter FAST_START_PARALLEL_ROLLBACK to LOW / HIGH.
ALTER SYSTEM SET FAST_START_PARALLEL_ROLLBACK = HIGH
OR
ALTER SYSTEM SET FAST_START_PARALLEL_ROLLBACK = LOW
b. If the parallel recovery is hanging, enable serial recovery:
To enable serial recovery mode, set the parameter FAST_START_PARALLEL_ROLLBACK to FALSE.
PS: The parameter FAST_START_PARALLEL_ROLLBACK will be effective only when SMON does the transaction recovery (generally after a instance crash).
Monitor the transaction recovery by SMON
select usn, state, undoblockstotal "Total", undoblocksdone "Done", undoblockstotal-undoblocksdone "ToDo", decode(cputime,0,‘unknown‘,sysdate+(((undoblockstotal-undoblocksdone) / (undoblocksdone / cputime)) / 86400)) "Estimated time to complete" from v$fast_start_transactions;
USN STATE Total Done ToDo Estimated time to Complete
-------- ---------------- -------- -------- -------- ---------------------------------
->none
####solution:
You may notice that UNDOBLOCKSDONE is not increasing or increases very slowly.
1、停止并行回滚,减少IO请求,快速提升系统响应能力
如果你没时间等待回滚进程完成回滚操作,可根据如下提示进行操作。
最后在google上根据ora_p001, wait for a undo record 的关键字,找到了一些信息,以下信息引起了我的注意:
Oracle工程师首先怀疑是临时表空间空间不足导致,经检查临时表空间没有空间不足的情况,仔细观察日志发现重做日志文件不断切换,分析应该是有较多的事务没有完成提交或者有较多没有提交的事务完成回滚。现在面临的问题是我们没有很多时间去等待所有的事务去完成回滚或提交。解决问题的思路就是如何尽快结束这些事务的回滚或提交。
1) 查看spfile文件中是否有fast_start_parallel_rollback参数的设置,检查结果G网数据库没有设置该参数。如果没有显式设置,则该参数的默认值为low。修改该参数值为false
2) 将数据库启动到nomount状态:startup nomount
3) 修改改参数值:alter system set fast_start_parallel_rollback = FALSE scope=spfile
4) shutdown immediate关闭数据库
5) startup启动
6) 查看该参数是否生效:show parameter fast_start_parallel_rollback
7) 等待一段时间
8) shutdown immediate数据库可以关闭
2、加快回滚速度
提高并行回滚进程的数量,设置为HIGH时回滚进程=4*cpu数。在sql命令行模式下执行
ALTER SYSTEM SET FAST_START_PARALLEL_ROLLBACK = HIGH
refer http://blog.chinaunix.net/xmlrpc.php?r=blog/article&uid=116213&id=147411
#####issue 3:
set lines 200
set pagesize 2000
col name for a65
SELECT NAME,ASYNCH_IO FROM V$DATAFILE F,V$iosTAT_FILE I
WHERE F.FILE#=I.FILE_NO
AND FILETYPE_NAME=‘Data File‘;