Welcome Guest ( Log In | Register )


 
 
 
 
 
 

 
 
Oracle 

Performance Tuning Reference poster
 
Oracle training in Linux 

commands
 
Oracle training Weblogic Book
 
Easy Oracle Jumpstart
 
Oracle training & performance tuning books
 
Burleson Consulting Remote DB Administration
 
 
 
Reply to this topicStart new topic
> RAC node down - CPU utilization pegged - options?
jamie055
post Mar 26 2012, 02:10 PM
Post #1


Newbie
*

Group: Members
Posts: 8
Joined: 3-January 12
From: Cameron, NC
Member No.: 46,523



I had one of my RAC nodes go down due to a disk failure. I have a 3 node cluster running 10.2.0.4 on Dell 610's running Windows Server 2008.

I have been running AWR reports this afternoon and am seeing CPU time as my top timed event. Here is the exerpt from the report I am looking at:

Cache Sizes
~~~~~~~~~~~ Begin End
---------- ----------
Buffer Cache: 20,032M 20,032M Std Block Size: 8K
Shared Pool Size: 12,688M 12,688M Log Buffer: 6,336K

Load Profile
~~~~~~~~~~~~ Per Second Per Transaction
--------------- ---------------
Redo size: 978.95 750.37
Logical reads: 6,911.13 5,297.46
Block changes: 4.98 3.82
Physical reads: 0.13 0.10
Physical writes: 0.59 0.45
User calls: 13.82 10.59
Parses: 5.77 4.42
Hard parses: 0.17 0.13
Sorts: 79.32 60.80
Logons: 0.09 0.07
Executes: 29.40 22.53
Transactions: 1.30

% Blocks changed per Read: 0.07 Recursive Call %: 72.85
Rollback per transaction %: 77.06 Rows per Sort: 25.06

Instance Efficiency Percentages (Target 100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Buffer Nowait %: 100.00 Redo NoWait %: 100.00
Buffer Hit %: 100.00 In-memory Sort %: 100.00
Library Hit %: 98.67 Soft Parse %: 97.01
Execute to Parse %: 80.38 Latch Hit %: 99.99
Parse CPU to Parse Elapsd %: 85.71 % Non-Parse CPU: 97.88

Shared Pool Statistics Begin End
------ ------
Memory Usage %: 91.94 81.98
% SQL with executions>1: 99.90 97.62
% Memory for SQL w/exec>1: 99.70 94.49

Top 5 Timed Events Avg %Total
~~~~~~~~~~~~~~~~~~ wait Call
Event Waits Time (s) (ms) Time Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
CPU time 21,183 79.7
control file sequential read 748,953 5,224 7 19.6 System I/O
enq: TX - row lock contention 3,971 2,044 515 7.7 Applicatio
control file parallel write 102,826 1,858 18 7.0 System I/O
log file parallel write 111,596 1,629 15 6.1 System I/O
-------------------------------------------------------------
RAC Statistics DB/Inst: FLMS/flms2 Snaps: 7064-7150

Begin End
----- -----
Number of Instances: 3 2


Global Cache Load Profile
~~~~~~~~~~~~~~~~~~~~~~~~~ Per Second Per Transaction
--------------- ---------------
Global Cache blocks received: 1.74 1.34
Global Cache blocks served: 1.73 1.33
GCS/GES messages received: 8.57 6.57
GCS/GES messages sent: 8.47 6.49
DBWR Fusion writes: 0.10 0.07
Estd Interconnect traffic (KB) 31.16


Global Cache Efficiency Percentages (Target local+remote 100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Buffer access - local cache %: 99.97
Buffer access - remote cache %: 0.03
Buffer access - disk %: 0.00


Global Cache and Enqueue Services - Workload Characteristics
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Avg global enqueue get time (ms): 2.2

Avg global cache cr block receive time (ms): 1.5
Avg global cache current block receive time (ms): 1.4

Avg global cache cr block build time (ms): 0.0
Avg global cache cr block send time (ms): 0.1
Global cache log flushes for cr blocks served %: 30.3
Avg global cache cr block flush time (ms): 1.5

Avg global cache current block pin time (ms): 0.0
Avg global cache current block send time (ms): 0.1
Global cache log flushes for current blocks served %: 0.5
Avg global cache current block flush time (ms): 35.5

I am very junior and have just started experimenting with statistics and tuning. I wanted to ask if there was anything that I could be doing to alleviate the workload on the 2 remaining nodes right now? As far as I understand it there is no way to stop users from hitting the database and without my 3rd node to help load balance the CPU will continue to be pegged until the end of the day as the users are logging off. What else could I be doing to help this matter until we can rebuild our server?

Thank you for any insight!
Jamie
Go to the top of the page
 
+Quote Post
burleson
post Mar 26 2012, 02:59 PM
Post #2


Advanced Member
***

Group: Members
Posts: 11,490
Joined: 26-January 04
Member No.: 13



Hi Jamie,

>> I had one of my RAC nodes go down due to a disk failure.

You had a server crash from a bad disk?

Call 800-223-1711 and tell them it's a production emergency, they will get the Oracle RAC team on it imnmediately, while you wait.

************************************

>> I am very junior and have just started experimenting with statistics and tuning.


Take your AWR report and paste it into http://www.statspackanalyzer.com


******************************************
>> I wanted to ask if there was anything that I could be doing to alleviate the workload on the 2 remaining nodes right now?


You could add more CPU's to your servers, if they can be hot-plugged.

It's quite rare to run RAC on Windows, a super-reliable machanism on a super-bad platform . . .


Your long-term goal will be to get off of Windows into an industrial-strength OS like Linux.


--------------------
Hope this helps. . .

Donald K. Burleson
Oracle Press author
Author of Oracle Tuning: The Definitive Reference
Go to the top of the page
 
+Quote Post
jamie055
post Mar 26 2012, 04:38 PM
Post #3


Newbie
*

Group: Members
Posts: 8
Joined: 3-January 12
From: Cameron, NC
Member No.: 46,523



I have already used the analyzer as I write this, it is very helpful for me as there is so much in that report to try to discern. It says that I have a high logical I/O associated with cpu bottlenecks and that I have excessive rollbacks and I can guarantee that both of them are due to our application. I will see if this is something the developers can assist me with.

It also points out that the db file sequential read is 19 milliseconds. I did not configure the SAN that the clusters share, could that be the problem? It is a RAID 5 and I don't know what kind of disks they are so i don't know if the architecture or configuration would be an issue there?

I have been down that path in requesting a different OS platform to work off of with no success. I would like to discuss support as I have a project coming up that requires me to build another 3 node cluster. I don't know if my company would consider the cost, but I would like to present them with the option and pricing in order to ensure we accomplish the task within the deadline.

I really appreciate that you take the time to help answer questions on this forum and I enjoy reading through your entire website when I need to research issues.

Thank you,
Jamie


Go to the top of the page
 
+Quote Post
burleson
post Mar 26 2012, 10:38 PM
Post #4


Advanced Member
***

Group: Members
Posts: 11,490
Joined: 26-January 04
Member No.: 13



Hi Jamie,

>> It also points out that the db file sequential read is 19 milliseconds.

That's really slow, even for a PC. . . .

Check your tablespace average read times.


*****************************************

>> It is a RAID 5

Aha! See here:

http://www.dba-oracle.com/oracle_tips_raid5_bad.htm

Officially, Oracle DOES NOT recommend RAID5: They say to use RAID 0+1:

Please read this:

http://www.dba-oracle.com/oracle_tips_raid_usage.htm


******************************************

>> I don't know if my company would consider the cost, but I would like to present them with the option and pricing in order to ensure we accomplish the task within the deadline.

Well; my company, BC Remote DBA, assists shops who implement RAC, plus we do all database monitoring and off hours support. Sort of as a supplelemt to the DBA staff:

http://www.remote-dba.net/


--------------------
Hope this helps. . .

Donald K. Burleson
Oracle Press author
Author of Oracle Tuning: The Definitive Reference
Go to the top of the page
 
+Quote Post

Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 

Lo-Fi Version Time is now: 19th September 2014 - 06:46 AM