How to Apply patch on Oracle RAC

 Prerequisite:-

1: Check Oracle service status

[oracle@rac-node1 ~]$crs_stat -t
Name           Type         Target State    Host 
------------------------------------------------------------
ora....SM1.asm application ONLINE ONLINE dbsr...ode1 
ora....E1.lsnr application ONLINE ONLINE dbsr...ode1 
ora....de1.gsd application ONLINE ONLINE dbsr...ode1 
ora....de1.ons application ONLINE ONLINE dbsr...ode1 
ora....de1.vip application ONLINE ONLINE dbsr...ode1 
ora....SM2.asm application ONLINE ONLINE dbsr...ode2 
ora....E2.lsnr application ONLINE ONLINE dbsr...ode2 
ora....de2.gsd application ONLINE ONLINE dbsr...ode2 
ora....de2.ons application ONLINE ONLINE dbsr...ode2 
ora....de2.vip application ONLINE ONLINE dbsr...ode2 
ora.prod.db application    ONLINE ONLINE dbsr...ode1 
ora....b1.inst application ONLINE ONLINE dbsr...ode1 
ora....b2.inst application ONLINE ONLINE dbsr...ode2 
[oracle@rac-node1 ~]$

2: Stop database instance one by one
 
[oracle@rac-node1 ~]$ srvctl stop instance -i prod1 -d prod
[oracle@rac-node1 ~]$ srvctl stop instance -i prod2 -d prod

3: Stop ASM instance on each node one by one

[oracle@rac-node1 ~]$ srvctl stop asm -n rac-node1
[oracle@rac-node1 ~]$ srvctl stop asm -n rac-node2

4: Stop node applications one by one on each node

[oracle@rac-node1 ~]$ srvctl stop nodeapps -n rac-node1
[oracle@rac-node1 ~]$ srvctl stop nodeapps -n rac-node2

5: Check status again
[oracle@rac-node1 ~]$ crs_stat -t
 
 Name             Type       Target  State Host 
------------------------------------------------------------
ora....SM1.asm application OFFLINE OFFLINE 
ora....E1.lsnr application OFFLINE OFFLINE 
ora....de1.gsd application OFFLINE OFFLINE 
ora....de1.ons application OFFLINE OFFLINE 
ora....de1.vip application OFFLINE OFFLINE 
ora....SM2.asm application OFFLINE OFFLINE 
ora....E2.lsnr application OFFLINE OFFLINE 
ora....de2.gsd application OFFLINE OFFLINE 
ora....de2.ons application OFFLINE OFFLINE 
ora....de2.vip application OFFLINE OFFLINE 
ora.prod.db application    OFFLINE OFFLINE 
ora....b1.inst application OFFLINE OFFLINE 
ora....b2.inst application OFFLINE OFFLINE 
 
 [oracle@rac-node1 ~]
 
Download Patch from Metalink.

Unzip patch folder after that read readme file.

Go to Opatch top.


[oracle@rac-node1 rman]$ cd 9949948
OPatch should be in your path as follows
 
Set Oracle_Home.
 
 [oracle@rac-node1 9949948]$ export PATH=$PATH:/u01/app/oracle/product/10.2.0 /db_1/OPatch

Start apply Opatch

[oracle@rac-node1 9949948]$ opatch apply
Invoking OPatch 10.2.0.4.9
Oracle Interim Patch Installer version 10.2.0.4.9
Copyright (c) 2009, Oracle Corporation. All rights reserved.

Oracle Home : /u01/app/oracle/product/10.2.0/db_1
Central Inventory : /u01/app/oracle/OraInventory
 from : /etc/oraInst.loc
OPatch version : 10.2.0.4.9
OUI version : 10.2.0.5.0
OUI location : /u01/app/oracle/product/10.2.0/db_1/oui
Log file location : /u01/app/oracle/product/10.2.0/db_1/cfgtoollogs/opatch/opatch2012-09-20_23-38-29PM.log
Patch history file: /u01/app/oracle/product/10.2.0/db_1/cfgtoollogs/opatch/opatch_history.txt
ApplySession applying interim patch '9949948' to OH '/u01/app/oracle/product/10.2.0/db_1'
Running prerequisite checks...
OPatch detected the node list and the local node from the inventory. OPatch will patch the local system then propagate the patch to the remote nodes.

This node is part of an Oracle Real Application Cluster.
Remote nodes: 'rac-node2' 
Local node: 'rac-node1'
Please shutdown Oracle instances running out of this ORACLE_HOME on the local system.
(Oracle Home = '/u01/app/oracle/product/10.2.0/db_1')

Is the local system ready for patching? [y|n]
Y  ==> I put yes
User Responded with: Y
Backing up files and inventory (not for auto-rollback) for the Oracle Home
Backing up files affected by the patch '9949948' for restore. This might take a while...
Backing up files affected by the patch '9949948' for rollback. This might take a while...
Patching component oracle.rdbms, 10.2.0.5.0...
Updating archive file "/u01/app/oracle/product/10.2.0/db_1/lib/libserver10.a" with "lib/libserver10.a/ksfd.o"
Running make for target ioracle
ApplySession adding interim patch '9949948' to inventory
Verifying the update...
Inventory check OK: Patch ID 9949948 is registered in Oracle Home inventory with proper meta-data.
Files check OK: Files from Patch ID 9949948 are present in Oracle Home.
The local system has been patched. You can restart Oracle instances on it.

Patching in rolling mode.

The node 'rac-node2' will be patched next.

Please shutdown Oracle instances running out of this ORACLE_HOME on 'rac-node2'.
(Oracle Home = '/u01/app/oracle/product/10.2.0/db_1')
Is the node ready for patching? [y|n]
y  ==> I put yes for node2
User Responded with: Y
Updating nodes 'rac-node2' 
 Apply-related files are:
 FP = "/u01/app/oracle/product/10.2.0/db_1/.patch_storage/9949948_Aug_13_2010_06_35_20/rac/copy_files.txt"
 DP = "/u01/app/oracle/product/10.2.0/db_1/.patch_storage/9949948_Aug_13_2010_06_35_20/rac/copy_dirs.txt"
 MP = "/u01/app/oracle/product/10.2.0/db_1/.patch_storage/9949948_Aug_13_2010_06_35_20/rac/make_cmds.txt"
 RC = "/u01/app/oracle/product/10.2.0/db_1/.patch_storage/9949948_Aug_13_2010_06_35_20/rac/remote_cmds.txt"
Instantiating the file "/u01/app/oracle/product/10.2.0/db_1/.patch_storage/9949948_Aug_13_2010_06_35_20/rac/copy_files.txt.instantiated" by replacing $ORACLE_HOME in "/u01/app/oracle/product/10.2.0/db_1/.patch_storage/9949948_Aug_13_2010_06_35_20/rac/copy_files.txt" with actual path.
Propagating files to remote nodes...
Instantiating the file "/u01/app/oracle/product/10.2.0/db_1/.patch_storage/9949948_Aug_13_2010_06_35_20/rac/copy_dirs.txt.instantiated" by replacing $ORACLE_HOME in "/u01/app/oracle/product/10.2.0/db_1/.patch_storage/9949948_Aug_13_2010_06_35_20/rac/copy_dirs.txt" with actual path.
Propagating directories to remote nodes...
Instantiating the file "/u01/app/oracle/product/10.2.0/db_1/.patch_storage/9949948_Aug_13_2010_06_35_20/rac/make_cmds.txt.instantiated" by replacing $ORACLE_HOME in "/u01/app/oracle/product/10.2.0/db_1/.patch_storage/9949948_Aug_13_2010_06_35_20/rac/make_cmds.txt" with actual path.
Running command on remote node 'rac-node2': 
cd /u01/app/oracle/product/10.2.0/db_1/rdbms/lib; /usr/bin/make -f ins_rdbms.mk ioracle ORACLE_HOME=/u01/app/oracle/product/10.2.0/db_1 || echo REMOTE_MAKE_FAILED::>&2

The node 'rac-node2' has been patched. You can restart Oracle instances on it.
There were relinks on remote nodes. Remember to check the binary size and timestamp on the nodes 'rac-node2' .
The following make commands were invoked on remote nodes:
'cd /u01/app/oracle/product/10.2.0/db_1/rdbms/lib; /usr/bin/make -f ins_rdbms.mk ioracle ORACLE_HOME=/u01/app/oracle/product/10.2.0/db_1
'
OPatch succeeded.
Start Oracle services on both nodes
[oracle@rac-node1 9949948]$ srvctl start nodeapps -n rac-node1
[oracle@rac-node1 9949948]$ srvctl start nodeapps -n rac-node2
[oracle@rac-node1 9949948]$ srvctl start asm -n rac-node1
[oracle@rac-node1 9949948]$ srvctl start asm -n rac-node2
[oracle@rac-node1 9949948]$ crs_stat -t
Name Type Target State Host 
------------------------------------------------------------
ora....SM1.asm application ONLINE ONLINE dbsr...ode1 
ora....E1.lsnr application ONLINE ONLINE dbsr...ode1 
ora....de1.gsd application ONLINE ONLINE dbsr...ode1 
ora....de1.ons application ONLINE ONLINE dbsr...ode1 
ora....de1.vip application ONLINE ONLINE dbsr...ode1 
ora....SM2.asm application ONLINE ONLINE dbsr...ode2 
ora....E2.lsnr application ONLINE ONLINE dbsr...ode2 
ora....de2.gsd application ONLINE ONLINE dbsr...ode2 
ora....de2.ons application ONLINE ONLINE dbsr...ode2 
ora....de2.vip application ONLINE ONLINE dbsr...ode2 
ora.prod.db application OFFLINE OFFLINE 
ora....b1.inst application OFFLINE OFFLINE 
ora....b2.inst application OFFLINE OFFLINE

Start database instances on both nodes:-


[oracle@rac-node1 9949948]$ srvctl start instance -i prod1 -d prod
[oracle@rac-node1 9949948]$ srvctl start instance -i prod1 -d prod
[oracle@rac-node1 9949948]$ crs-stat -t
 
 Name            Type       Target State     Host 
------------------------------------------------------------
ora....SM1.asm application ONLINE ONLINE dbsr...ode1 
ora....E1.lsnr application ONLINE ONLINE dbsr...ode1 
ora....de1.gsd application ONLINE ONLINE dbsr...ode1 
ora....de1.ons application ONLINE ONLINE dbsr...ode1 
ora....de1.vip application ONLINE ONLINE dbsr...ode1 
ora....SM2.asm application ONLINE ONLINE dbsr...ode2 
ora....E2.lsnr application ONLINE ONLINE dbsr...ode2 
ora....de2.gsd application ONLINE ONLINE dbsr...ode2 
ora....de2.ons application ONLINE ONLINE dbsr...ode2 
ora....de2.vip application ONLINE ONLINE dbsr...ode2 
ora.prod.db application    ONLINE ONLINE dbsr...ode2 
ora....b1.inst application ONLINE ONLINE dbsr...ode1 
ora....b2.inst application ONLINE ONLINE dbsr...ode2
 
 
[oracle@rac-node1 script]$ sqlplus /nolog
SQL*Plus: Release 10.2.0.5.0 – Production on Thu Sep 20 23:44:01 2012
Copyright (c) 1982, 2010, Oracle. All Rights Reserved.

SQL>conn /as sysdba
Connected.
SQL>

How to Tuning SQL Advisor in Oracle SQL Developer

SQL Tuning Advisor in Oracle SQL Developer


The SQL Tuning Advisor analyzes high-volume SQL statements and offers tuning recommendations. It takes one or more SQL statements as an input and invokes the Automatic Tuning Optimizer to perform SQL tuning on the statements. It can run against any given SQL statement. The SQL Tuning Advisor provides advice in the form of precise SQL actions for tuning the SQL statements along with their expected performance benefits. The recommendation or advice provided relates to the collection of statistics on objects, creation of new indexes, restructuring of the SQL statement, or creation of a SQL profile. You can choose to accept the recommendation to complete the tuning of the SQL statements.
Oracle Database can automatically tune SQL statements by identifying problematic SQL statements and implementing tuning recommendations using the SQL Tuning Advisor. You can also run the SQL Tuning Advisor selectively on a single or a set of SQL statements that have been identified as problematic.
In this tutorial, you learn how to run and review the recommendations of the SQL Tuning Advisor.
Note: Tuning Advisor is part of the Tuning Pack, one of the Oracle management packs and is available for purchase with Enterprise Edition. 

Software and Hardware Requirements

The following is a list of software requirements:
  • Oracle Database 11g Enterprise Edition with access to the Tuning and Diagnostic management packs and with the sample schema installed.
  • Oracle SQL Developer 3.0.

Prerequisites

Before starting this tutorial, you should:
1 .
Install Oracle SQL Developer 3.0 from OTN. Follow the readme instructions here.
2 .
Install Oracle Database 11g with the Sample schema.

Creating a Database Connection

The first step to managing database objects using Oracle SQL Developer 3.0 is to create a database connection.
Perform the following steps to create a database connection:
Note: If you already have database connections for SCOTT and SYSTEM, you do not need to perform the following steps. You can move to Providing Privileges to the Scott User topic.
1. If you have installed the SQL Developer icon on your desktop, click the icon to start your SQL Developer and move to Step 4. If you do not have the icon located on your desktop, perform the following steps to create a shortcut to launch SQL Developer 3.0 from your desktop.
Open the directory where the SQL Developer 3.0 is located, right-click sqldeveloper.exe (on Windows) or sqldeveloper.sh (on Linux) and select Send to > Desktop (create shortcut).
Screenshot for Step

2. On the desktop, you will find an icon named Shortcut to sqldeveloper.exe. Double-click the icon to open SQL Developer 3.0.
Note: To rename it, select the icon and then press F2 and enter a new name.
Screenshot for Step

3. Your Oracle SQL Developer opens up.
Screenshot for Step
4. In the Connections navigator, right-click Connections and select New Connection.
Screenshot for Step
5. The New / Select Database Connection dialog opens. Enter the connection details as follows and click Test.
Connection Name: system
Username: system
Password: <your_password> (Select Save Password)
Hostname: localhost
SID: <your_own_SID>
Screenshot for Step
6. Check for the status of the connection on the left-bottom side (above the Help button). It should read Success. Click Save and then click Connect.
Screenshot for Step
7. In the Connections navigator, to create a new connection to the scott schema, right-click Connections and select New Connection.
Screenshot for Step
8. The New / Select Database Connection dialog opens. Enter the connection details as follows and click Test.
Connection Name: scott
Username: scott
Password: <your_password> (Select Save Password)
Hostname: localhost
SID: <your_own_SID>
Screenshot for Step
9. Check for the status of the connection on the left-bottom side (above the Help button). It should read Success. Click Save and then click Connect.
Screenshot for Step
10 .
The connection is saved and you can view the two newly created connections in the Connections list.
Screenshot for Step

Providing Privileges and Removing the existing Statistics on the Scott User

A user requires certain privileges to run the SQL Tuning Advisor. Also, in order to collect and manage statistics on the SCOTT schema, the existing statistics have to be cleared. Below are the steps to grant SQL Tuning Advisor privileges and remove the existing statistics on the scott user.

1. Click SQL Worksheet and select system user.
Screenshot for Step

2. To grant privileges to the scott user to run the SQL Tuning Advisor, enter the following lines of code. Click Run Statement .
grant advisor to scott;
grant administer sql tuning set to scott;
Screenshot for Step

3. The output for the statements is displayed.
Screenshot for Step
4. The Oracle database allows you to collect statistics of many different kinds in order to improve performance. To illustrate some of the features the SQL Tuning Advisor offers, clear the existing statistics from the SCOTT schema.
To delete the schema statistics, enter the following line of code.
exec DBMS_STATS.DELETE_SCHEMA_STATS ('scott');
Select the statement and click Run Statement . Screenshot for Step
With the DBMS_STATS package you can view and modify optimizer statistics gathered for database objects.The DELETE_SCHEMA_STATS procedure deletes statistics for an entire schema.
 
5. The output for the statement appears.
Screenshot for Step

Running the SQL Tuning Advisor on a SQL statement

In this topic, you run the SQL Tuning Advisor on a SQL statement. Four types of analysis are performed by the SQL Tuning Advisor on the SQL statement.
All the recommendations are displayed in the Overview. You can also view each recommendation individually.
1. Open the SQL Worksheet for the scott user by clicking SQL Worksheet .
Screenshot for Step
 
2. Enter the following SQL statement in the worksheet.
select sum(e.sal), avg(e.sal), count(1), e.deptno from dept d, emp e group by e.deptno order by e.deptno;
Screenshot for Step
Note that the above SQL statement has an unused reference to the "dept" table.


3. Select the SQL statement and click SQL Tuning Advisor .
Screenshot for Step

4. The SQL Tuning Advisor output appears.
Screenshot for Step

5. In the left navigator, click Statistics. In this analysis, objects with stale or missing statistics are identified and appropriate recommendations are made to remedy the problem.
Screenshot for Step

6. In the left navigator, click SQL Profile. Here, the SQL Tuning Advisor recommends to improve the execution plan by the generation of a SQL Profile.
Screenshot for Step

7. Click the Detail tabbed page to view the SQL Profile Finding.
Screenshot for Step

8. In the left navigator, click Indexes. This recommends whether the SQL statement might benefit from an index. If necessary, new indexes that can significantly enhance query performances are identified and recommended.
Screenshot for Step

9. Click the Overview tabbed page. In this case, there are no index recommendations.
Screenshot for Step

10. In the left navigator, click Restructure SQL. In this analysis, relevant suggestions are made to restructure selected SQL statements for improved performance.
Screenshot for Step

Implementing SQL Tuning Advisor recommendations

You can implement the SQL Tuning Advisor recommendation feature. This will enable you to update the statistics in scott schema. Perform the following steps to implement the SQL Tuning Advisor recommendations:
1. In the Connections navigator, right-click scott and select Gather Schema Statistics....
Screenshot for Step
 
2. In Gather Schema Statistics, select Estimate Percent as 100 from the drop-down list so that all rows in each table are read. This ensures that the statistics are as accurate as possible.
Screenshot for Step

3. Click Apply.
Screenshot for Step

4. A confirmation message appears. Click OK.
Screenshot for Step

5. To run the SQL Tuning Advisor on the SQL statement again, select the SQL statement and click SQL Tuning Advisor .
Screenshot for Step

6. The SQL Tuning Advisor output appears. By gathering statistics, the Statistics and SQL Profile advice is now removed.
Screenshot for Step

7. In the left navigator, click each of the SQL Tuning Advisor Implement Type to check if all the recommendations have been implemented.
Screenshot for Step
Note the issues reported to you:
Screenshot for Step
Note the issues reported to you:

Screenshot for Step

Screenshot for Step
Note that the Restructure SQL recommendation to remove an unused table remains.

8. Remove the "dept" table in the SQL statement and click SQL Advisor .
Screenshot for Step

9. The output appears. All of the advice recommendations have been removed.
Screenshot for Step

Why Node Eviction in RAC??



Whenever, Database Administrator face Node Reboot issue, First thing to look at should be /var/log/message and OS Watcher Log of the Database Node which was rebooted.

var/log/messages will give you an actual picture of reboot:- Exact time of restart status of resource like swap and RAM etc.

1. High Load on Database Server:- Out of 100 Issues, I have seen 70 to 80 time High load on the system was reason for Node Evictions, One common scenario is due to high load RAM and SWAP space of DB node got exhaust and system stops working and finally reboot.

So, Every time you see a node eviction start investigation with /var/log/messages and Analyze OS Watcher Log
. Below is a situation when a Database Node was reboot due to high load.

/var/log/messages output from a Database Node just before Node eviction:

Apr 23 08:15:04 remotedb06 kernel: Node 0 DMA: 2*4kB 1*8kB 0*16kB 1*32kB 2*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15792kB
Apr 23 08:15:04 remotedb06 kernel: Node 0 DMA32: 150*4kB 277*8kB 229*16kB 186*32kB 51*64kB 65*128kB 82*256kB 13*512kB 3*1024kB 3*2048kB 78*4096kB = 380368kB
Apr 23 08:15:04 remotedb06 kernel: Node 0 Normal: 12362*4kB 58*8kB 0*16kB 0*32kB 0*64kB 0*128kB 2*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB = 52984kB
Apr 23 08:15:09 remotedb06 kernel: 83907 total pagecache pages
Apr 23 08:15:11 remotedb06 kernel: 39826 pages in swap cache
Apr 23 08:15:11 remotedb06 kernel: Swap cache stats: add 30820220, delete 30780387, find 18044378/19694662
Apr 23 08:15:14 remotedb06 kernel: Free swap  = 4kB
Apr 23 08:15:15 remotedb06 kernel: Total swap = 25165816kB
Apr 23 08:15:16 remotedb06 kernel: 25165808 pages RAM
Apr 23 08:15:28 remotedb06 kernel: 400673 pages reserved
Apr 23 08:15:30 remotedb06 kernel: 77691135 pages shared
Apr 23 08:15:31 remotedb06 kernel: 9226743 pages non-shared
Apr 23 08:15:33 remotedb06 kernel: osysmond.bin: page allocation failure. order:4, mode:0xd0
From above message, we can see that this system has only 4kB free swap out of 24G swap space. This means system neither has RAM not SWAP for processing, which case a reboot. This picture is also clear from OS Watcher of system.
08:15:29     CPU   %user   %nice    %sys %iowait    %irq   %soft  %steal   %idle    intr/s

08:15:29     all    0.67    0.00   82.60   16.60    0.00    0.06    0.00    0.07  42521.98
08:15:35     all    0.84    0.00   23.40   73.19    0.00    0.11    0.00    2.47  35859.59
08:15:39     all    1.22    0.00   85.13   13.47    0.00    0.13    0.00    0.04  40569.31
08:15:45     all    1.57    0.00   98.22    0.13    0.00    0.08    0.00    0.00  36584.31
08:15:50     all    1.41    0.00   98.48    0.04    0.00    0.07    0.00    0.00  36643.10
08:15:54     all    0.84    0.00   99.09    0.03    0.00    0.05    0.00    0.00  36257.02
08:16:06     all    0.95    0.00   98.88    0.09    0.00    0.08    0.00    0.00  39113.15
08:16:11     all    0.87    0.00   99.00    0.07    0.00    0.06    0.00    0.00  37490.22
08:16:16     all    0.89    0.00   98.97    0.07    0.00    0.07    0.00    0.00  37681.04
08:16:22     all    0.78    0.00   99.12    0.05    0.00    0.05    0.00    0.00  36963.75
08:16:38     all    0.79    0.00   98.86    0.28    0.00    0.08    0.00    0.00  36639.21
08:16:43     all    0.78    0.00   98.79    0.34    0.00    0.08    0.00    0.01  37405.99
08:16:54     all    1.06    0.00   98.71    0.12    0.00    0.11    0.00    0.00  38102.37
08:17:08     all    1.69    0.00   67.02   30.93    0.00    0.06    0.00    0.29  37316.55

How to avoid Node Reboot due to High Load ?


The simple and best way to avoid this is use Oracle Database Resource Manager(DBRM) help to resolve this by allowing the database to have more control over how hardware resources and their allocation. DBA should setup Resource consumer group and Resource plan and should use them as per requirements. In Extra data system Exadata DBA can use IORM to setup resource allocation among multiple database instances.


2. voting disk not reachable:- One of the another reason for Node Reboot is clusterware is not able to access a minimum number of the voting files.When the node aborts for this reason, the node alert log will show CRS-1606 error.

Here is a scenario for voting disk not reachable:

2013-01-26 10:15:47.177
[cssd(3743)]CRS-1606:The number of voting files available, 1, is less than the minimum number of voting files required, 2, resulting in CSSD 
termination to ensure data integrity; details at (:CSSNM00018:) in /u01/app/11.2.0/grid/log/apdbc76n1/cssd/ocssd.log
There could be two reasons for this issue:

    A. Connection to the voting disk is interrupted.
    B. if only one voting disk is in use and version is less than 11.2.0.3.4, hitting known bug  13869978.

How to Solve Voting Disk Outage ?

There could be many reasons for voting disk is not reachable, Here are few general approach for DBA to follow.


1. Use command "crsctl query css votedisk" on a node where clusterware is up to get a list of all the voting files.
2. Check that each node can access the devices underlying each voting file.
3. Check for permissions to each voting file/disk have not been changed.
4. Check OS, SAN, and storage logs for any errors from the time of the incident.
5. Apply fix for 13869978 if only one voting disk is in use. This is fixed in 11.2.0.3.4 patch set and above, and 11.2.0.4 and above


If any voting files or underlying devices are not currently accessible from any node, work with storage administrator and/or system administrator to resolve it at storage and/or OS level.

3. Missed Network Connection between Nodes: In technical term this is called as Missed Network Heartbeat (NHB). Whenever there is communication gap or no communication between nodes on private network (interconnect) due to network outage or some other reason. A node abort itself to avoid "split brain" situation. The most common (but not exclusive) cause of missed NHB is network problems communicating over the private interconnect.

Suggestion to troubleshoot Missed Network Heartbeat.


1. Check OS statistics from the evicted node from the time of the eviction. DBA can use OS Watcher to look at OS Stats at time of issue, check oswnetstat and oswprvtnet for network related issues.
2. Validate the interconnect network setup with the Help of Network administrator.
3. Check communication over the private network.
4. Check that the OS network settings are correct by running the RACcheck tool.

4. Database Or ASM Instance Hang: Sometimes Database or ASM instance hang can cause Node reboot. In these case Database instance is hang and is terminated afterwards, which cause either reboot cluster or Node eviction. DBA should check alert log of Database and ASM instance for any hang situation which might cause this issue.

Database Alert log file entry for Database Hang Situation:

Wed Jun 05 03:25:01 2013
Sweep [inc][122970]: completed
System State dumped to trace file /u01/app/oracle/diag/rdbms/remotedb/remotedb2/incident/incdir_122970/remotedb2_m005_118200_i122970_a.trc
Wed Jun 05 03:25:33 2013
DSKM process appears to be hung. Initiating system state dump.
Wed Jun 05 03:25:33 2013
System state dump requested by (instance=2, osid=9757 (GEN0)), summary=[system state dump request (ksz_check_ds)].
System State dumped to trace file /u01/app/oracle/diag/rdbms/remotedb/remotedb2/trace/remotedb2_diag_9759.trc
Wed Jun 05 03:27:04 2013
Errors in file /u01/app/oracle/diag/rdbms/remotedb/remotedb2/trace/remotedb2_pmon_9707.trc  (incident=122897):
ORA-00600: internal error code, arguments: [ksz_cln_proc1], [0x143642090], [3], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/remotedb/remotedb2/incident/incdir_122897/remotedb2_pmon_9707_i122897.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /u01/app/oracle/diag/rdbms/remotedb/remotedb2/trace/remotedb2_pmon_9707.trc:
ORA-00600: internal error code, arguments: [ksz_cln_proc1], [0x143642090], [3], [], [], [], [], [], [], [], [], []
PMON (ospid: 9707): terminating the instance due to error 472
Wed Jun 05 03:27:05 2013
System state dump requested by (instance=2, osid=9707 (PMON)), summary=[abnormal instance termination].
System State dumped to trace file /u01/app/oracle/diag/rdbms/remotedb/remotedb2/trace/remotedb2_diag_9759.trc
Instance terminated by PMON, pid = 9707
Wed Jun 05 03:45:58 2013
Starting ORACLE instance (normal)
At the same time resources at Cluster level start failing and node was evicted by itself. Real Application Cluster Log files
[/u01/app/11.2.0.3/grid/bin/oraagent.bin(96437)]CRS-5011:Check of resource "PCAB1DE" failed: details at "(:CLSN00007:)" in "/u01/app/11.2.0.3/grid/log/exs2db04/agent/crsd/oraagent_oracle/oraagent_oracle.log"
2013-06-05 03:27:06.649
[/u01/app/11.2.0.3/grid/bin/oraagent.bin(96437)]CRS-5011:Check of resource "P1PANIE" failed: details at "(:CLSN00007:)" in "/u01/app/11.2.0.3/grid/log/exs2db04/agent/crsd/oraagent_oracle/oraagent_oracle.log"
2013-06-05 03:32:41.360
[/u01/app/11.2.0.3/grid/bin/oraagent.bin(96437)]CRS-5011:Check of resource "P1ESXPE" failed: details at "(:CLSN00007:)" in "/u01/app/11.2.0.3/grid/log/exs2db04/agent/crsd/oraagent_oracle/oraagent_oracle.log"
2013-06-05 03:36:25.675
[cssd(67762)]CRS-1662:Member kill requested by node exs2db03 for member number 1, group DBPCAB1DE
2013-06-05 03:36:58.775
[/u01/app/11.2.0.3/grid/bin/oraagent.bin(96437)]CRS-5818:Aborted command 'clean' for resource 'ora.pcab1de.db'. Details at (:CRSAGF00113:) {0:6:28} in /u01/app/11.2.0.3/grid/log/exs2db04/agent/crsd/oraagent_oracle/oraagent_oracle.log.
2013-06-05 03:37:06.784
[/u01/app/11.2.0.3/grid/bin/oraagent.bin(96437)]CRS-5818:Aborted command 'clean' for resource 'ora.p1panie.db'. Details at (:CRSAGF00113:) {0:6:29} in /u01/app/11.2.0.3/grid/log/exs2db04/agent/crsd/oraagent_oracle/oraagent_oracle.log.
2013-06-05 03:37:26.307
[cssd(67762)]CRS-1608:This node was evicted by node 3, exs2db03; details at (:CSSNM00005:) in /u01/app/11.2.0.3/grid/log/exs2db04/cssd/ocssd.log.
2013-06-05 03:37:26.318
[cssd(67762)]CRS-1656:The CSS daemon is terminating due to a fatal error; Details at (:CSSSC00012:) in /u01/app/11.2.0.3/grid/log/exs2db04/cssd/ocssd.log
2013-06-05 03:37:26.318
[cssd(67762)]CRS-1652:Starting clean up of CRSD resources.
2013-06-05 03:37:26.340
[cssd(67762)]CRS-1608:This node was evicted by node 3, exs2db03; details at (:CSSNM00005:) in /u01/app/11.2.0.3/grid/log/exs2db04/cssd/ocssd.log.
2013-06-05 03:37:26.450
[cssd(67762)]CRS-1608:This node was evicted by node 3, exs2db03; details at (:CSSNM00005:) in /u01/app/11.2.0.3/grid/log/exs2db04/cssd/ocssd.log.
2013-06-05 03:37:26.647
[cssd(67762)]CRS-1608:This node was evicted by node 3, exs2db03; details at (:CSSNM00005:) in /u01/app/11.2.0.3/grid/log/exs2db04/cssd/ocssd.log.
2013-06-05 03:37:26.869
[cssd(67762)]CRS-1608:This node was evicted by node 3, exs2db03; details at (:CSSNM00005:) in /u01/app/11.2.0.3/grid/log/exs2db04/cssd/ocssd.log.
2013-06-05 03:37:26.921The main reason for this issue was "DSKM process appears to be hung.", but definately this was not because of high load Since OS Watcher at this time doesn't show any load.
So, I believe this could be due to some bug in database.


In few of the cases, bugs could be the reason for node reboot, bug may be at Database level, ASM level or at Real Application Cluster level. Here, after initial investigation from Database Administrator side, DBA should open an SR with Oracle Support.

Error 404 /OA_HTML/AppsLogin was not found on the server

facing some problem in apps R12 login page (12.1.3)
Suddenly the login page was throwing an error
404 /OA_HTML/AppsLogin was not found on the
server
Error check on $LOG_FILE.
The access_log shows no errors. error.log from
$LOG_HOME/ora/10.1.3/Apache shows
[Tue Jan 26 11:27:51 2014] [error] [client
123.238.255.152] [ecid:
1295330271:192.168.0.20:10897:0:2,0] File does not
exist: /oracle/inst/apps/ORCL_oracle/
portal/OA_HTML/AppsLogin


Solutions:-

 
I recreated the jar files using adadmin
and compiled the jsp using
cd $FND_TOP/patch/115/bin
perl ojspCompile.pl --compile --flush -p 20 -log /tmp/ojspc_error2.log

Nothing worked out.
Fix:
This problem occurred because wsrp_service.wsdl was owned by root but the services were being started by applmgr.

Ensure that the file
$ORA_CONFIG_HOME/10.1.3/j2ee/oacore/application-deployments/oacore/html/server-wsdl/wsrp_service.wsdl

it shows it is owned by root. Somebody started the services of Apps as root user.
Changed the owner of the file to Apps Owner.
Restarted the service.

RAC Interview Questions and Answers



Oracle RAC DBA Interview Questions and Answers




1. What is RAC?

RAC stands for Real Application cluster.
It is a clustering solution from Oracle Corporation that ensures high availability of databases by providing instance failover, media failover features.
Oracle RAC is a cluster database with a shared cache architecture that overcomes the limitations of traditional shared-nothing and shared-disk approaches to provide a highly scalable and available database solution for all the business applications.
Oracle RAC provides the foundation for enterprise grid computing.

2. What is Oracle RAC One Node?
Oracle RAC one Node is a single instance running on one node of the cluster while the 2nd node is in cold standby mode. If the instance fails for some reason then RAC one node detect it and restart the instance on the same node or the instance is relocate to the 2nd node incase there is failure or fault in 1st node. The benefit of this feature is that it provides a cold failover solution and it automates the instance relocation without any downtime and does not need a manual intervention. Oracle introduced this feature with the release of 11gR2 (available with Enterprise Edition).

3. What is RAC and how is it different from non RAC databases?
Oracle Real Application clusters allows multiple instances to access a single database; the instances will be running on multiple nodes. 
In Real Application Clusters environments, all nodes concurrently execute transactions against the same database.
Real Application Clusters coordinates each node's access to the shared data to provide consistency and integrity.

4. What are the advantages of RAC (Real Application Clusters)?

Reliability - if one node fails, the database won't fail
Availability - nodes can be added or replaced without having to shutdown the database
Scalability - more nodes can be added to the cluster as the workload increases

5. What is Oracle RAC One Node?

Oracle RAC one Node is a single instance running on one node of the cluster while the 2nd node is in cold standby mode. If the instance fails for some reason then RAC one node detect it and restart the instance on the same node or the instance is relocate to the 2nd node incase there is failure or fault in 1st node. The benefit of this feature is that it provides a cold failover solution and it automates the instance relocation without any downtime and does not need a manual intervention. Oracle introduced this feature with the release of 11gR2 (available with Enterprise Edition).

6. What is Cache Fusion?

Oracle RAC is composed of two or more instances. When a block of data is read from data-file by an instance within the cluster and another instance is in need of the same block, it is easy to get the block image from the instance which has the block in its SGA rather than reading from the disk. To enable inter instance communication Oracle RAC makes use of interconnects. The Global Enqueue Service (GES) monitors and Instance enqueue process manages the cache fusion.

7. What command would you use to check the availability of the RAC system?
crs_stat -t -v (-t -v are optional)

8. How do we verify that RAC instances are running?
SQL>select * from V$ACTIVE_INSTANCES;
The query gives the instance number under INST_NUMBER column,host_:instancename under INST_NAME column.

9. How can you connect to a specific node in a RAC environment?

tnsnames.ora ensure that you have INSTANCE_NAME specified in it.

10. Which is the "MASTER NODE" in RAC?

The node with the lowest node number will become master node and dynamic premastering of the resources will take place.
To find out the master node for particular resource, you can query v$ges_resource for MASTER_NODE column.
To find out which is the master node, you can see ocssd.log file and search for "master node number".
When the first master node fails in the cluster the lowest node number will become master node.

11. What components in RAC must reside in shared storage?

All data-files, control-files, SP-Files, redo log files must reside on cluster-aware shred storage.

12. Give few examples for solutions that support cluster storage?
·ASM (automatic storage management),
·Raw disk devices,
·Network file system (NFS),
·OCFS2 and
·OCFS (Oracle Cluster Fie systems).

13. What are Oracle Cluster Components?
1. Cluster Interconnect (HAIP)
2.Shared Storage (OCR/Voting Disk)
3.Clusterware software
4.Oracle Kernel Components

14. What are Oracle RAC Components?

       VIP, Node apps etc.

15. What are Oracle Kernel Components?
Basically Oracle kernel need to switch on with RAC On option when you convert to RAC, that is the difference as it facilitates few RAC big process like LMON, LCK, LMD, LMS etc.

16. How to turn on RAC?

# link the oracle libraries
$ cd $ORACLE_HOME/rdbms/lib
$ make -f ins_rdbms.mk rac_on
# rebuild oracle
$ cd $ORACLE_HOME/bin
$ relink oracle

17. Disk architecture in RAC?

SAN (Storage Area Networks) - generally using fibre to connect to the SAN
NAS (Network Attached Storage) - generally using a network to connect to the NAS using either NFS, ISCSI

18. What is Oracle Cluster-ware?

The Cluster-ware software allows nodes to communicate with each other and forms the cluster that makes the nodes work as a single logical server.
The software is run by the Cluster Ready Services (CRS) using the Oracle Cluster Registry (OCR) that records and maintains the cluster and node membership information and the voting disk which acts as a tiebreaker during communication failures. Consistent heartbeat information travels across the interconnect to the voting disk when the cluster is running.

19. Real Application Clusters?
Oracle RAC is a cluster database with a shared cache architecture that overcomes the limitations of traditional shared-nothing and shared-disk approaches to provide a highly scalable and available database solution for all your business applications. Oracle RAC provides the foundation for enterprise grid computing.

Oracle’s Real Application Clusters (RAC) option supports the transparent deployment of a single database across a cluster of servers, providing fault tolerance from hardware failures or planned outages. Oracle RAC running on clusters provides Oracle’s highest level of capability in terms of availability, scalability, and low-cost computing.

One DB opened by multiple instances so the the db ll be Highly Available if an instance crashes.
Cluster Software. Oracles Cluster-ware or products like Veritas Volume Manager are required to provide the cluster support and allow each node to know which nodes belong to the cluster and are available and with Oracle Cluster-ware to know which nodes have failed and to eject then from the cluster, so that errors on that node can be cleared.

Oracle Cluster-ware has two key components Cluster Registry OCR and Voting Disk.

The cluster registry holds all information about nodes, instances, services and ASM storage if used, it also contains state information ie they are available and up or similar.

The voting disk is used to determine if a node has failed, i.e. become separated from the majority. If a node is deemed to no longer belong to the majority then it is forcibly rebooted and will after the reboot add itself again the the surviving cluster nodes.

20. What are the Oracle Cluster-ware key components?

Oracle Cluster-ware has two key components Cluster Registry OCR and Voting Disk.

21. What is Voting Disk and OCR?
Voting Disk
Oracle RAC uses the voting disk to manage cluster membership by way of a health check and arbitrates cluster ownership among the instances in case of network failures. The voting disk must reside on shared disk.
A node must be able to access more than half of the voting disks at any time.
For example, if you have 3 voting disks configured, then a node must be able to access at least two of the voting disks at any time. If a node cannot access the minimum required number of voting disks it is evicted, or removed, from the cluster.

Oracle Cluster Registry (OCR)
The cluster registry holds all information about nodes, instances, services and ASM storage if used, it also contains state information ie they are available and up or similar.
The OCR must reside on shared disk that is accessible by all of the nodes in your cluster.

22. What are the administrative tasks involved with voting disk?
Following administrative tasks are performed with the voting disk :
1) Backing up voting disks
2) Recovering Voting disks
3) Adding voting disks
4) Deleting voting disks
5) Moving voting disks

23. Can you add voting disk online? Do you need voting disk backup?

Yes,  as per documentation, if you have multiple voting disk you can add online, but if you have only one voting disk , by that cluster will be down as its lost you just need to start crs in exclusive mode and add the vote disk using
crsctl add votedisk <path>

24. What is the Oracle Recommendation for backing up voting disk?
Oracle recommends us to use the dd command to backup the voting disk with a minimum block size of 4KB.

25. How do we backup voting disks?
1) Oracle recommends that you back up your voting disk after the initial cluster creation and after we complete any node addition or deletion procedures.
2) First, as root user, stop Oracle Cluster-ware (with the crsctl stop crs command) on all nodes. Then, determine the current voting disk by issuing the following command:
crsctl query vote disk css
3) Then, issue the dd or o copy command to back up a voting disk, as appropriate.
Give the syntax of backing up voting disks:-
On Linux or UNIX systems:
dd if=voting_disk_name of=backup_file_name
where,
voting_disk_name is the name of the active voting disk
backup_file_name is the name of the file to which we want to back up the voting disk contents
On Windows systems, use the ocopy command:
copy voting_disk_name backup_file_name

26. How do we verify an existing current backup of OCR?
We can verify the current backup of OCR using the following command: ocrconfig –show backup.

27. You have lost OCR disk, what is your next step?
The cluster stack will be down due to the fact that cssd is unable to maintain the integrity, this is true in 10g, from 11gR2 onwards, and the crsd stack will be down, the hasd still up and running. You can add the ocr back by restoring the automatic backup or import the manual backup,

28. What are the major RAC wait events?
In a RAC environment the buffer cache is global across all instances in the cluster and hence the processing differs. The most common wait events related to this are gc cr request and gc buffer busy

GC CR request: the time it takes to retrieve the data from the remote cache
Reason: RAC Traffic Using Slow Connection or Inefficient queries (poorly tuned queries will increase the amount of data blocks requested by an Oracle session. The more blocks requested typically means the more often a block will need to be read from a remote instance via the interconnect.)

GC BUFFER BUSY:
It is the time the remote instance locally spends accessing the requested data block.

29. What do you do if you see GC CR BLOCK LOST in top 5 Timed Events in AWR Report?

This is most likely due to a fault in interconnecting network.
Check netstat -s
if you see "fragments dropped" or "packet reassemblies failed" , Work with your system administrator find the fault with network.

30. How do OCSSD starts first if voting disk & OCR resides in ASM Disk groups?
You might wonder how CSSD, which is required to start the clustered ASM instance,
31. Can be started if voting disks are stored in ASM?
Without access to the voting disks there is no CSS, hence the node cannot join the cluster.
But without being part of the cluster, CSSD cannot start the ASM instance.
To solve this problem the ASM disk headers have new metadata in 11.2:
you can use kfed to read the header of an ASM disk containing a voting disk.
The kfdhdb.vfstart and kfdhdb.vfend fields tell CSS where to find the voting file. This does not require the ASM instance to be up.
Once the voting disks are located, CSS can access them and joins the cluster.

32. What is gsdctl in RAC? List gsdctl commands in Oracle RAC?
GSDCTL stands for Global Service Daemon Control, we can use gsdctl commands to start, stop, and obtain the status of the GSD service on any platform.

The options for gsdctl are:-
$ gsdctl start -- To start the GSD service
$ gsdctl stop  -- To stop the GSD service
$ gsdctl stat  -- To obtain the status of the GSD service

Log file location for gsdctl:
$ ORACLE_HOME/srvm/log/gsdaemon_node_name.log
33. How do you troubleshoot node reboot?
Please check metalink ...
Note 265769.1 Troubleshooting CRS Reboots
Note.559365.1 Using Diagwait as a diagnostic to get more information for diagnosing Oracle Clusterware Node evictions.

Srvctl cannot start instance, I get the following error PRKP-1001 CRS-0215,
34. However sql-plus can start it on both nodes? Or, how do you identify the problem?
Set the environmental variable SRVM_TRACE to true... And start the instance with srvctl. Now you will get detailed error stack.

35. What are Oracle Cluster-ware processes for 10g on UNIX and Linux?
Cluster Synchronization Services (ocssd) — Manages cluster node membership and runs as the oracle user; failure of this process results in cluster restart.

Cluster Ready Services (crsd) — The crs process manages cluster resources (which could be a database, an instance, a service, a Listener, a virtual IP (VIP) address, an application process, and so on) based on the resource's configuration information that is stored in the OCR. This includes start, stop, monitor and failover operations. This process runs as the root user

Event manager daemon (evmd)
— a background process that publishes events that crs creates.

Process Monitor Daemon (OPROCD) —this process monitor the cluster and provide I/O fencing. OPROCD performs its check, stops running, and if the wake up is beyond the expected time, then OPROCD resets the processor and reboots the node. An OPROCD failure results in Oracle Cluster-ware restarting the node. OPROCD uses the hangcheck timer on Linux platforms.

RACG (racgmain, racgimon) —Extends cluster-ware to support Oracle-specific requirements and complex resources. Runs server callout scripts when FAN events occur.

36. What are Oracle database background processes specific to RAC?

Oracle RAC is composed of two or more database instances. They are composed of Memory structures and background processes same as the single instance database. Oracle RAC instances use two processes GES (Global Enqueue Service), GCS (Global Cache Service) that enable cache fusion. Oracle RAC instances are composed of following background processes:
ACMS—Atomic Control-file to Memory Service (ACMS)
GTX0-j—Global Transaction Process
LMON—Global Enqueue Service Monitor
LMD—Global Enqueue Service Daemon
LMS—Global Cache Service Process
LCK0—Instance Enqueue Process
RMSn—Oracle RAC Management Processes (RMSn)
RSMN—Remote Slave Monitor
To ensure that each Oracle RAC database instance obtains the block that it needs to satisfy a query or transaction, Oracle RAC instances use two processes, the Global Cache Service (GCS) and the Global Enqueue Service (GES). The GCS and GES maintain records of the statuses of each data file and each cached block using a Global Resource Directory (GRD). The GRD contents are distributed across all of the active instances.

37. What is GRD?
GRD stands for Global Resource Directory. The GES and GCS maintain records of the statuses of each data-file and each cached block using global resource directory. This process is referred to as cache fusion and helps in data integrity.

38. What is ACMS?

ACMS stands for Atomic Control-file Memory Service. In an Oracle RAC environment ACMS is an agent that ensures a distributed SGA memory up date (ie) SGA updates are globally committed on success or globally aborted in event of a failure.

39. What is SCAN listener?
A scan listener is something that additional to node listener which listens the incoming db connection requests from the client which got through the scan IP, it got end points configured to node listener where it routes the db connection requests to particular node listener.

SCAN IP can be disabled if not required. However SCAN IP is mandatory during the RAC installation. Enabling/disabling SCAN IP is mostly used in oracle apps environment by the concurrent manager (kind of job scheduler in oracle apps).

Steps to disable the SCAN IP,
i.  Do not use SCAN IP at the client end.
ii. Stop scan listener
    srvctl stop scan_listener
iii.Stop scan
    srvctl stop scan (this will stop the scan vip's)
iv. Disable scan and disable scan listener
    srvctl disable scan

40. What are the different networks components are in 10g RAC?
 Public, private, and vip components
Private interfaces is for intra node communication.
VIP is all about availability of application. When a node fails then the VIP component fail over to some other node, this is the reason that all applications should based on vip components means tns entries should have vip entry in the host list

41. What is an interconnect network?
An interconnect network is a private network that connects all of the servers in a cluster. The interconnect network uses a switch/multiple switches that only the nodes in the cluster can access.

42. What is the use of cluster interconnecting?
Cluster interconnect is used by the Cache fusion for inter instance communication.

43. How can we configure the cluster interconnect?
· Configure User Datagram Protocol (UDP) on Gigabit Ethernet for cluster interconnects.
· On UNIX and Linux systems we use UDP and RDS (Reliable data socket) protocols to be used by Oracle Cluster-ware.
· Windows clusters use the TCP protocol.

44. What is the purpose of Private Interconnect?
Cluster-ware uses the private interconnect for cluster synchronization (network heartbeat) and daemon communication between the clustered nodes. This communication is based on the TCP protocol.
RAC uses the interconnect for cache fusion (UDP) and inter-process communication (TCP). Cache Fusion is the remote memory mapping of Oracle buffers, shared between the caches of participating nodes in the cluster.

45. What is a virtual IP address or VIP?
A virtual IP address or VIP is an alternate IP address that the client connections use instead of the standard public IP address. To configure VIP address, we need to reserve a spare IP address for each node, and the IP addresses must use the same subnet as the public network.

 46. What is the use of VIP?
If a node fails, then the node's VIP address fails over to another node on which the VIP address can accept TCP connections but it cannot accept Oracle connections.

47. Why do we have a Virtual IP (VIP) in Oracle RAC?
Without using VIPs or FAN, clients connected to a node that died will often wait for a TCP timeout period (which can be up to 10 min) before getting an error. As a result, you don't really have a good HA solution without using VIPs.
When a node fails, the VIP associated with it is automatically failed over to some other node and new node re-arps the world indicating a new MAC address for the IP. Subsequent packets sent to the VIP go to the new node, which will send error RST packets back to the clients. This results in the clients getting errors immediately.

48. Give situations under which VIP address failover happens?
VIP addresses failover happens when the node on which the VIP address runs fails; all interfaces for the VIP address fails, all interfaces for the VIP address are disconnected from the network.

49. What is the significance of VIP address failover?
When a VIP address failover happens, Clients that attempt to connect to the VIP address receive a rapid connection refused error .They don't have to wait for TCP connection timeout messages.

50. What is the use of a service in Oracle RAC environment?
Applications should use the services feature to connect to the Oracle database. Services enable us to define rules and characteristics to control how users and applications connect to database instances.

51. What are the characteristics controlled by Oracle services feature?
The characteristics include a unique name, workload balancing, failover options, and high availability.

52. What enables the load balancing of applications in RAC?

Oracle Net Services enable the load balancing of application connections across all of the instances in an Oracle RAC database.

53. What are the types of connection load-balancing?
Connection Workload management is one of the key aspects when you have RAC instances as you want to distribute the connections to specific nodes/instance or those have less load.
There are two types of connection load-balancing:
1.Client Side load balancing (also called as connect time load balancing)
2.Server side load balancing (also called as Listener connection load balancing)

54. What is the difference between server-side and client-side connection load balancing?

Client-side balancing happens at client side where load balancing is done using listener. In case of server-side load balancing listener uses a load-balancing advisory to redirect connections to the instance providing best service.

 Client Side load balancing: -
Oracle client side load balancing feature enables clients to randomize the connection requests among all the available listeners based on their load.

An tns entry that contains all nodes entries and use load balance=on (default its on) will use the connect time load balancing or client side load balancing.

Sample Client Side TNS Entry:-

    finance =
    (DESCRIPTION =
         (ADDRESS = (PROTOCOL = TCP)(HOST = myrac2-vip)(PORT = 2042))
         (ADDRESS = (PROTOCOL = TCP)(HOST = myrac1-vip)(PORT = 2042))
         (ADDRESS = (PROTOCOL = TCP)(HOST = myrac3-vip)(PORT = 2042))
    (LOAD_BALANCE = yes)
    (CONNECT_DATA =
         (SERVER = DEDICATED)
         (SERVICE_NAME = FINANCE) (FAILOVER=ON)
    (FAILOVER_MODE =  (TYPE = SELECT) (METHOD = BASIC) (RETRIES = 180) (DELAY = 5))
    )
    )

Server side load balancing:- This improves the connection performance by balancing the number of active connections among multiple instances and dispatchers. In a single instance environment (shared servers), the listener selects the least dispatcher to handle the incoming client requests. In rac environments, PMON is aware of all instances load and dispatchers, and depending on the load information PMON redirects the connection to the least loaded node.

In a RAC environment, *.remote_listener parameter which is a tns entry containing all nodes addresses need to set to enable the load balance advisory updates to PMON.

Sample Tns entry should be in an instances of RAC cluster,

    local_listener=LISTENER_MYRAC1
    remote_listener = LISTENERS_MYRACDB

55. What are the administrative tools used for Oracle RAC environments?

Oracle RAC cluster can be administered as a single image using the below
·       OEM (Enterprise Manager),
·       SQL*PLUS,
·       Server control (SRVCTL),
·       Cluster Verification Utility (CLUVFY),
·       DBCA,
·       NETCA

56. Name some Oracle Cluster-ware tools and their uses?
·OIFCFG - allocating and de-allocating network interfaces.
·OCRCONFIG - Command-line tool for managing Oracle Cluster Registry.
·OCRDUMP - Identify the interconnect being used.
·CVU - Cluster verification utility to get status of CRS resources.

57. What is the difference between CRSCTL and SRVCTL?
crsctl manages cluster-ware-related operations:
    Starting and stopping Oracle Cluster-ware
    Enabling and disabling Oracle Cluster-ware daemons
    Registering cluster resources

srvctl manages Oracle resource–related operations:
    Starting and stopping database instances and services
    Also from 11gR2 manages the cluster resources like network,vip,disks etc

58. How do we remove ASM from a Oracle RAC environment?

We need to stop and delete the instance in the node first in interactive or silent mode.After that asm can be removed using srvctl tool as follows:
srvctl stop asm -n node_name
srvctl remove asm -n node_name
We can verify if ASM has been removed by issuing the following command:
srvctl config asm -n node_name

59. How do we verify that an instance has been removed from OCR after deleting an instance?
Issue the following srvctl command:
srvctl config database -d database_name
cd CRS_HOME/bin
./crs_stat

60. What are the modes of deleting instances from Oracle Real Application cluster Databases?

We can delete instances using silent mode or interactive mode using DBCA (Database Configuration Assistant).

61. What are the background process that exists in 11gr2 and functionality?

Process Name     Functionality
crsd     •The CRS daemon (crsd) manages cluster resources based on configuration information that is stored in Oracle Cluster Registry (OCR) for each resource. This includes start, stop, monitor, and failover operations. The crsd process generates events when the status of a resource changes.
cssd     •Cluster Synchronization Service (CSS): Manages the cluster configuration by controlling which nodes are members of the cluster and by notifying members when a node joins or leaves the cluster. If you are using certified third-party cluster-ware, then CSS processes interfaces with your cluster-ware to manage node membership information. CSS has three separate processes: the CSS daemon (ocssd), the CSS Agent (cssdagent), and the CSS Monitor (cssdmonitor). The cssdagent process monitors the cluster and provides input/output fencing. This service formerly was provided by Oracle Process Monitor daemon (oprocd), also known as Ora Fence Service on Windows. A cssdagent failure results in Oracle Clusterware restarting the node.
diskmon     •Disk Monitor daemon (diskmon): Monitors and performs input/output fencing for Oracle Exadata Storage Server. As Exadata storage can be added to any Oracle RAC node at any point in time, the diskmon daemon is always started when ocssd is started.
evmd     •Event Manager (EVM): Is a background process that publishes Oracle Cluster-ware events
mdnsd     •Multicast domain name service (mDNS): Allows DNS requests. The mDNS process is a background process on Linux and UNIX, and a service on Windows.
gnsd     •Oracle Grid Naming Service (GNS): Is a gateway between the cluster mDNS and external DNS servers. The GNS process performs name resolution within the cluster.
ons     •Oracle Notification Service (ONS): Is a publish-and-subscribe service for communicating Fast Application Notification (FAN) events
oraagent     •oraagent: Extends cluster-ware to support Oracle-specific requirements and complex resources. It runs server callout scripts when FAN events occur. This process was known as RACG in Oracle Clusterware 11g Release 1 (11.1).
orarootagent     •Oracle root agent (orarootagent): Is a specialized oraagent process that helps CRSD manage resources owned by root, such as the network, and the Grid virtual IP address
oclskd     •Cluster kill daemon (oclskd): Handles instance/node evictions requests that have been escalated to CSS
gipcd     •Grid IPC daemon (gipcd): Is a helper daemon for the communications infrastructure
ctssd     •Cluster time synchronization daemon(ctssd) to manage the time synchronization between nodes, rather depending on NTP

62. Under which user or owner the process will start?
Component                     Name of the Process         Owner
Oracle High Availability Service         ohasd                 init, root
Cluster Ready Service (CRS)             Cluster Ready Services         root
Cluster Synchronization Service (CSS)         ocssd,cssd monitor, cssdagent     grid owner
Event Manager (EVM)                 evmd, evmlogger         grid owner
Cluster Time Synchronization Service (CTSS)     octssd                 root
Oracle Notification Service (ONS)         ons, eons             grid owner
Oracle Agent                     oragent             grid owner
Oracle Root Agent                 orarootagent             root
Grid Naming Service (GNS)             gnsd                 root
Grid Plug and Play (GPnP)             gpnpd                 grid owner
Multicast domain name service (mDNS)         mdnsd                 grid owner

63. What is the major difference between 10g and 11g RAC?

There is not much difference between 10g and 11gR (1) RAC. But there is a significant difference in 11gR2.

Prior to 11gR1(10g) RAC, the following were managed by Oracle CRS
    Databases
    Instances
    Applications
    Node Monitoring
    Event Services
    High Availability

From 11gR2(onwards) its completed HA stack managing and providing the following resources as like the other cluster software like VCS etc.
    Databases
    Instances
    Applications
    Cluster Management
    Node Management
    Event Services
    High Availability
    Network Management (provides DNS/GNS/MDNSD services on behalf of other traditional services) and SCAN – Single Access Client Naming method, HAIP
    Storage Management (with help of ASM and other new ACFS filesystem)
    Time synchronization (rather depending upon traditional NTP)
    Removed OS dependent hang checker etc, manages with own additional monitor process

64. What is hang check timer?
The hang-check timer checks regularly the health of the system. If the system hangs or stop the node will be restarted automatically.
There are 2 key parameters for this module:
-> hang-check-tick: this parameter defines the period of time between checks of system health. The default value is 60 seconds; Oracle recommends setting it to 30seconds.
-> hang-check-margin: this defines the maximum hang delay that should be tolerated before hang-check-timer resets the RAC node.

65. State the initialization parameters that must have same value for every instance in an Oracle RAC database?

Some initialization parameters are critical at the database creation time and must have same values. Their value must be specified in SPFILE or PFILE for every instance. The list of parameters that must be identical on every instance are given below:
ACTIVE_INSTANCE_COUNT
ARCHIVE_LAG_TARGET
COMPATIBLE
CLUSTER_DATABASE
CLUSTER_DATABASE_INSTANCE
CONTROL_FILES
DB_BLOCK_SIZE
DB_DOMAIN
DB_FILES
DB_NAME
DB_RECOVERY_FILE_DEST
DB_RECOVERY_FILE_DEST_SIZE
DB_UNIQUE_NAME
INSTANCE_TYPE (RDBMS or ASM)
PARALLEL_MAX_SERVERS
REMOTE_LOGIN_passWORD_FILE
UNDO_MANAGEMENT

----------------------------------------------------------------------------------------------------------

66. What is RAC? What is the benefit of RAC over single instance database?
In Real Application Clusters environments, all nodes concurrently execute transactions against the same database. Real Application Clusters coordinates each node's access to the shared data to provide consistency and integrity.
Benefits:
Improve response time
Improve throughput
High availability
Transparency


Advantages of RAC (Real Application Clusters)

Reliability - if one node fails, the database won't fail
Availability - nodes can be added or replaced without having to shutdown the database
Scalability - more nodes can be added to the cluster as the workload increases




67. What is a virtual IP address or VIP?

A virtual IP address or VIP is an alternate IP address that the client connections use instead of the standard public IP address. To configure VIP address, we need to reserve a spare IP address for each node, and the IP addresses must use the same subnet as the public network.

68. What is the use of VIP?

If a node fails, then the node's VIP address fails over to another node on which the VIP address can accept TCP connections but it cannot accept Oracle connections.
Give situations under which VIP address failover happens:-
VIP addresses failover happens when the node on which the VIP address runs fails, all interfaces for the VIP address fails, all interfaces for the VIP address are disconnected from the network.
Using virtual IP we can save our TCP/IP timeout problem because Oracle notification service maintains communication between each nodes and listeners.

69. What is the significance of VIP address failover?
When a VIP address failover happens, Clients that attempt to connect to the VIP address receive a rapid connection refused error .They don't have to wait for TCP connection timeout messages.

70. What is voting disk?
Voting Disk is a file that sits in the shared storage area and must be accessible by all nodes in the cluster. All nodes in the cluster register their heart-beat information in the voting disk, so as to confirm that they are all operational. If heart-beat information of any node in the voting disk is not available that node will be evicted from the cluster. The CSS (Cluster Synchronization Service) daemon in the cluster-ware maintains the heart beat of all nodes to the voting disk. When any node is not able to send heartbeat to voting disk, then it will reboot itself, thus help avoiding the split-brain syndrome.

For high availability, Oracle recommends that you have a minimum of three or odd number (3 or greater) of voting disks.

Voting Disk - is file that resides on shared storage and Manages cluster members.  Voting disk reassigns cluster ownership between the nodes in case of failure.

The Voting Disk Files are used by Oracle Cluster-ware to determine which nodes are currently members of the cluster. The voting disk files are also used in concert with other Cluster components such as CRS to maintain the clusters integrity.

Oracle Database 11g Release 2 provides the ability to store the voting disks in ASM along with the OCR. Oracle Cluster-ware can access the OCR and the voting disks present in ASM even if the ASM instance is down. As a result CSS can continue to maintain the Oracle cluster even if the ASM instance has failed.

71. How many voting disks are you maintaining?

By default Oracle will create 3 voting disk files in ASM.

Oracle expects that you will configure at least 3 voting disks for redundancy purposes. You should always configure an odd number of voting disks >= 3. This is because loss of more than half your voting disks will cause the entire cluster to fail.

You should plan on allocating 280MB for each voting disk file. For example, if you are using ASM and external redundancy then you will need to allocate 280MB of disk for the voting disk. If you are using ASM and normal redundancy you will need 560MB.

72. Why we need to keep odd number of voting disks?

Oracle expects that you will configure at least 3 voting disks for redundancy purposes. You should always configure an odd number of voting disks >= 3. This is because loss of more than half your voting disks will cause the entire cluster to fail.


73. What are Oracle RAC software components?
Oracle RAC is composed of two or more database instances. They are composed of Memory structures and background processes same as the single instance database. Oracle RAC instances use two processes GES (Global Enqueue Service), GCS(Global Cache Service) that enable cache fusion. Oracle RAC instances are composed of following background processes:
ACMS—Atomic Controlfile to Memory Service (ACMS)
GTX0-j—Global Transaction Process
LMON—Global Enqueue Service Monitor
LMD—Global Enqueue Service Daemon
LMS—Global Cache Service Process
LCK0—Instance Enqueue Process
RMSn—Oracle RAC Management Processes (RMSn)
RSMN—Remote Slave Monitor

74. What are Oracle Cluster-ware processes for 10g?
Cluster Synchronization Services (ocssd) — Manages cluster node membership and runs as the oracle user; failure of this process results in cluster restart.
Cluster Ready Services (crsd) — The crs process manages cluster resources (which could be a database, an instance, a service, a Listener, a virtual IP (VIP) address, an application process, and so on) based on the resource's configuration information that is stored in the OCR. This includes start, stop, monitor and failover operations. This process runs as the root user
Event manager daemon (evmd) —a background process that publishes events that crs creates.
Process Monitor Daemon (OPROCD) —this process monitor the cluster and provide I/O fencing. OPROCD performs its check, stops running, and if the wake up is beyond the expected time, then OPROCD resets the processor and reboots the node. An OPROCD failure results in Oracle Cluster-ware restarting the node. OPROCD uses the hang-check timer on Linux platforms.
RACG (racgmain, racgimon) —Extends cluster-ware to support Oracle-specific requirements and complex resources. Runs server callout scripts when FAN events occur.

75. What are Oracle database background processes specific to RAC?

LMS—Global Cache Service Process
LMD—Global Enqueue Service Daemon
LMON—Global Enqueue Service Monitor
LCK0—Instance Enqueue Process
Oracle RAC instances use two processes, the Global Cache Service (GCS) and the Global Enqueue Service (GES). The GCS and GES maintain records of the statuses of each data file and each cached block using a Global Resource Directory (GRD). The GRD contents are distributed across all of the active instances.

76. What is Cache Fusion?
Transform of data across instances through private interconnect is called cache fusion. Oracle RAC is composed of two or more instances. When a block of data is read from data file by an instance within the cluster and another instance is in need of the same block, it is easy to get the block image from the instance which has the block in its SGA rather than reading from the disk. To enable inter instance communication Oracle RAC makes use of interconnects. The Global Enquire Service (GES) monitors and Instance enquires process manages the cache fusion

77. What is SCAN? (11gR2 feature)
Single Client Access Name (SCAN) is s a new Oracle Real Application Clusters (RAC) 11g Release 2 feature that provides a single name for clients to access an Oracle Database running in a cluster. The benefit is clients using SCAN do not need to change if you add or remove nodes in the cluster.

SCAN provides a single domain name via (DNS), allowing and-users to address a RAC cluster as-if it were a single IP address. SCAN works by replacing a hostname or IP list with virtual IP addresses (VIP).

Single client access name (SCAN) is meant to facilitate single name for all Oracle clients to connect to the cluster database, irrespective of number of nodes and node location. Until now, we have to keep adding multiple address records in all clients tnsnames.ora, when a new node gets added to or deleted from the cluster.

Single Client Access Name (SCAN) eliminates the need to change TNSNAMES entry when nodes are added to or removed from the Cluster. RAC instances register to SCAN listeners as remote listeners. Oracle recommends assigning 3 addresses to SCAN, which will create 3 SCAN listeners, though the cluster has got dozens of nodes.. SCAN is a domain name registered to at least one and up to three IP addresses, either in DNS (Domain Name Service) or GNS (Grid Naming Service). The SCAN must resolve to at least one address on the public network. For high availability and scalability, Oracle recommends configuring the SCAN to resolve to three addresses.

78. What are SCAN components in a cluster?
1. SCAN Name
2.SCAN IPs (3)
3.SCAN Listeners (3)

79. What is FAN?
Fast application Notification as it abbreviates to FAN relates to the events related to instances, services and nodes. This is a notification mechanism that Oracle RAC uses to notify other processes about the configuration and service level information that includes service status changes such as, UP or DOWN events. Applications can respond to FAN events and take immediate action.

80. What is TAF?
TAF (Transparent Application Failover) is a configuration that allows session fail-over between different nodes of a RAC database cluster.
Transparent Application Failover (TAF). If a communication link failure occurs after a connection is established, the connection fails over to another active node. Any disrupted transactions are rolled back, and session properties and server-side program variables are lost. In some cases, if the statement executing at the time of the failover is a Select statement, that statement may be automatically re-executed on the new connection with the cursor positioned on the row on which it was positioned prior to the failover.

After an Oracle RAC node crashes—usually from a hardware failure—all new application transactions are automatically rerouted to a specified backup node. The challenge in rerouting is to not lose transactions that were "in flight" at the exact moment of the crash. One of the requirements of continuous availability is the ability to restart in-flight application transactions, allowing a failed node to resume processing on another server without interruption. Oracle's answer to application failover is a new Oracle Net mechanism dubbed Transparent Application Failover. TAF allows the DBA to configure the type and method of failover for each Oracle Net client.
TAF architecture offers the ability to restart transactions at either the transaction (SELECT) or session level.
81. What are the requirements for Oracle Cluster-ware?
1. External Shared Disk to store Oracle Cluster ware file (Voting Disk and Oracle Cluster Registry - OCR) 
2. Two network cards on each cluster ware node (and three set of IP address) -
Network Card 1 (with IP address set 1) for public network
Network Card 2 (with IP address set 2) for private network (for inter node communication between rac nodes used by cluster-ware and rac database)
IP address set 3 for Virtual IP (VIP) (used as Virtual IP address for client connection and for connection failover)
3. Storage Option for OCR and Voting Disk - RAW, OCFS2 (Oracle Cluster File System), NFS,
Which enable the load balancing of applications in RAC?
Oracle Net Services enable the load balancing of application connections across all of the instances in an Oracle RAC database.

82. How to find location of OCR file when CRS is down?
If you need to find the location of OCR (Oracle Cluster Registry) but your CRS is down.
When the CRS is down:
Look into “ocr.loc” file, location of this file changes depending on the OS:
On Linux: /etc/oracle/ocr.loc
On Solaris: /var/opt/oracle/ocr.loc

When CRS is UP:
Set ASM environment or CRS environment then run the below command:
ocr-check

83. In 2 node RAC, how many NIC’s are using ?
2 network cards on each clusterware node 
Network Card 1 (with IP address set 1) for public network
Network Card 2 (with IP address set 2) for private network (for inter node communication between rac nodes used by clusterware and rac database)

84. in 2 nodes RAC, how many IP’s are using?
6 - 3 set of IP address
## eth1-Public:  2
## eth0-Private: 2
## VIP: 2

85. How to find IP’s information in RAC?
Edit the /etc/hosts file as shown below:
# does not remove the following line, or various programs
# That requires network functionality will fail.
127.0.0.1               localhost.localdomain localhost
## Public Node names
 192.168.10.11          node1-pub.hingu.net     node1-pub
192.168.10.22          node2-pub.hingu.net     node2-pub
## Private Network (Interconnect)
 192.168.0.11            node1-prv               node1-prv
192.168.0.22            node2-prv               node2-prv
## Private Network (Network Area storage)
 192.168.1.11            node1-nas               node1-nas
192.168.1.22            node2-nas               node2-nas
192.168.1.33            nas-server              nas-server
## Virtual IPs
 192.168.10.111          node1-vip.hingu.net     node1-vip
192.168.10.222          node2-vip.hingu.net     node2-vip

86. What is difference between RAC ip addresses?
Public IP address is the normal IP address typically used by DBA and SA to manage storage, system and database. Public IP addresses are reserved for the Internet.
Private IP address is used only for internal clustering processing (Cache Fusion) (aka as interconnect). Private IP addresses are reserved for private networks.
VIP is used by database applications to enable fail over when one cluster node fails. The purpose for having VIP is so client connection can be failover to surviving nodes in case there is failure


87. Can application developer access the private ip?
No. private IP address is used only for internal clustering processing (Cache Fusion) (aka as interconnect)

Oracle 11g RAC Interview Questions

1. What is the major difference between 10g and 11g RAC?
Well, there is not much difference between 10g and 11gR (1) RAC.
But there is a significant difference in 11gR2.
Prior to 11gR1 (10g) RAC, the following were managed by Oracle CRS
From 11gR2 (onwards) it’s completed HA stack managing and providing the following resources as like the other cluster software like VCS etc.
  • Databases
  • Instances
  • Applications
  • Cluster Management
  • Node Management
  • Event Services
  • High Availability
  • Network Management (provides DNS/GNS/MDNSD services on behalf of other traditional services) and SCAN – Single Access Client Naming method, HAIP
  • Storage Management (with help of ASM and other new ACFS filesystem)
  • Time synchronization (rather depending upon traditional NTP)
  • Removed OS dependent hang checker etc, manages with own additional monitor process
2.  What are Oracle Cluster Components?
Cluster Interconnect (HAIP)
Shared Storage (OCR/Voting Disk)
Cluster-ware software
3. What are Oracle RAC Components?
VIP, Node apps etc.
4. What are Oracle Kernel Components (nothing but how does Oracle RAC database differs than Normal single instance database in terms of Binaries and process)?
Basically Oracle kernel need to switched on with RAC On option when you convert to RAC, that is the difference as it facilitates few RAC bg process like LMON,LCK,LMD,LMS etc.
To turn on RAC
# link the oracle libraries
$ cd $ORACLE_HOME/rdbms/lib
$ make -f ins_rdbms.mk rac_on
# rebuild oracle
$ cd $ORACLE_HOME/bin
$ relink oracle
Oracle RAC is composed of two or more database instances. They are composed of Memory structures and background processes same as the single instance database.Oracle RAC instances use two processes GES(Global Enqueue Service), GCS(Global Cache Service) that enable cache fusion.Oracle RAC instances are composed of following background processes:
ACMS—Atomic Controlfile to Memory Service (ACMS)
GTX0-j—Global Transaction Process
LMON—Global Enqueue Service Monitor
LMD—Global Enqueue Service Daemon
LMS—Global Cache Service Process
LCK0—Instance Enqueue Process
RMSn—Oracle RAC Management Processes (RMSn)
RSMN—Remote Slave Monitor
5. What is Clusterware?
Software that provides various interfaces and services for a cluster. Typically, this includes capabilities that:
  • Allow the cluster to be managed as a whole
  • Protect the integrity of the cluster
  • Maintain a registry of resources across the cluster
  • Deal with changes to the cluster
  • Provide a common view of resources
6. What are the background process that exists in 11gr2 and functionality?
Process Name
Functionality
crsd
•The CRS daemon (crsd) manages cluster resources based on configuration information that is stored in Oracle Cluster Registry (OCR) for each resource. This includes start, stop, monitor, and failover operations. The crsd process generates events when the status of a resource changes.
cssd
•Cluster Synchronization Service (CSS): Manages the cluster configuration by controlling which nodes are members of the cluster and by notifying members when a node joins or leaves the cluster. If you are using certified third-party clusterware, then CSS processes interfaces with your clusterware to manage node membership information. CSS has three separate processes: the CSS daemon (ocssd), the CSS Agent (cssdagent), and the CSS Monitor (cssdmonitor). The cssdagent process monitors the cluster and provides input/output fencing. This service formerly was provided by Oracle Process Monitor daemon (oprocd), also known as OraFenceService on Windows. A cssdagent failure results in Oracle Clusterware restarting the node.
diskmon
•Disk Monitor daemon (diskmon): Monitors and performs input/output fencing for Oracle Exadata Storage Server. As Exadata storage can be added to any Oracle RAC node at any point in time, the diskmon daemon is always started when ocssd is started.
evmd
•Event Manager (EVM): Is a background process that publishes Oracle Clusterware events
mdnsd
•Multicast domain name service (mDNS): Allows DNS requests. The mDNS process is a background process on Linux and UNIX, and a service on Windows.
gnsd
•Oracle Grid Naming Service (GNS): Is a gateway between the cluster mDNS and external DNS servers. The GNS process performs name resolution within the cluster.
ons
•Oracle Notification Service (ONS): Is a publish-and-subscribe service for communicating Fast Application Notification (FAN) events
oraagent
•oraagent: Extends clusterware to support Oracle-specific requirements and complex resources. It runs server callout scripts when FAN events occur. This process was known as RACG in Oracle Clusterware 11g Release 1 (11.1).
orarootagent
•Oracle root agent (orarootagent): Is a specialized oraagent process that helps CRSD manage resources owned by root, such as the network, and the Grid virtual IP address
oclskd
•Cluster kill daemon (oclskd): Handles instance/node evictions requests that have been escalated to CSS
gipcd
•Grid IPC daemon (gipcd): Is a helper daemon for the communications infrastructure
ctssd
•Cluster time synchronisation daemon(ctssd) to manage the time syncrhonization between nodes, rather depending on NTP
7. Under which user or owner the process will start?
Component
Name of the Process
Owner
Oracle High Availability Service
ohasd
init, root
Cluster Ready Service (CRS)
Cluster Ready Services
root
Cluster Synchronization Service (CSS)
ocssd,cssd monitor, cssdagent
grid owner
Event Manager (EVM)
evmd, evmlogger
grid owner
Cluster Time Synchronization Service (CTSS)
octssd
root
Oracle Notification Service (ONS)
ons, eons
grid owner
Oracle Agent
oragent
grid owner
Oracle Root Agent
orarootagent
root
Grid Naming Service (GNS)
gnsd
root
Grid Plug and Play (GPnP)
gpnpd
grid owner
Multicast domain name service (mDNS)
mdnsd
grid owner
8. What is startup sequence in Oracle 11g RAC? 11g RAC startup sequence?
9. as you said Voting & OCR Disk resides in ASM Diskgroups, but as per startup sequence OCSSD starts first before than ASM, how is it possible?
How OCSSD does starts if voting disk & OCR resides in ASM Disk-groups?
You might wonder how CSSD, which is required to start the clustered ASM instance, can be started if voting disks are stored in ASM? This sound like a chicken-and-egg problem: without access to the voting disks there is no CSS, hence the node cannot join the cluster. But without being part of the cluster, CSSD cannot start the ASM instance. To solve this problem the ASM disk headers have new metadata in 11.2: you can use kfed to read the header of an ASM disk containing a voting disk. The kfdhdb.vfstart and kfdhdb.vfend fields tell CSS where to find the voting file. This does not require the ASM instance to be up. Once the voting disks are located, CSS can access them and joins the cluster.
10. How does SCAN work?
clip_image002
  1. Client Connected through SCAN name of the cluster (remember all three IP addresses round robin resolves to same Host name (SCAN Name), here in this case our scan name is cluster01-scan.cluster01.example.com
  2. The request reaches to DNS server in your corp and then resolves to one of the node out of three.  a. If GNS (Grid Naming service or domain is configured) that is a sub domain configured in  the DNS entry for to resolve cluster address the request will be handover to GNS (gnsd)
  3. Here in our case assume there is no GNS, now the with the help of SCAN listeners where end points are configured to database listener.
  4. Database Listeners the request and then process further.
  5. In case of node addition, Listener 4, client need not to know or need not change any thing from their tns entry (address of 4th node/instance) as they just using scan IP.
  6. Same case even in the node deletion.
11. What is GNS?
Grid Naming service is alternative service to DNS , which will act as a sub domain in your DNS but managed by Oracle, with GNS the connection is routed to the cluster IP and manages internally.
12. What is GPNP?
Grid Plug and Play along with GNS provide dynamic
In previous releases, adding or removing servers in a cluster required extensive manual preparation.
In Oracle Database 11g Release 2, GPnP allows each node to perform the following tasks dynamically:
    • Negotiating appropriate network identities for itself
    • Acquiring additional information from a configuration profile
    • Configuring or reconfiguring itself using profile data, making host names and addresses resolvable on the network
For example a domain should contain
  • –Cluster name: cluster01
  • –Network domain: example.com
  • –GPnP domain: cluster01.example.com
To add a node, simply connect the server to the cluster and allow the cluster to configure the node.
To make it happen, Oracle uses the profile located in $GI_HOME/gpnp/profiles/peer/profile.xml which contains the cluster resources, for example disk locations of ASM. etc.
So this profile will be read local or from the remote machine when plugged into cluster and dynamically added to cluster.
13. What are the file types that ASM support and keep in disk groups?
Control files
Flashback logs
Data Pump dump sets
Data files
DB SPFILE
Data Guard configuration
Temporary data files
RMAN backup sets
Change tracking bitmaps
Online redo logs
RMAN data file copies
OCR files
Archive logs
Transport data files
ASM SPFILE
14. List Key benefits of ASM?
  • Stripes files rather than logical volumes
  • Provides redundancy on a file basis
  • Enables online disk reconfiguration and dynamic rebalancing
  • Reduces the time significantly to resynchronize a transient failure by tracking changes while disk is offline
  • Provides adjustable rebalancing speed
  • Is cluster-aware
  • Supports reading from mirrored copy instead of primary copy for extended clusters
  • Is automatically installed as part of the Grid Infrastructure
15. List key benefits of Oracle Grid Infrastructure?
16. List some of the background process that used in ASM?
Process
Description
RBAL
Opens all device files as part of discovery and coordinates the rebalance activity
ARBn
One or more slave processes that do the rebalance activity
GMON
Responsible for managing the disk-level activities such as drop or offline and advancing the ASM disk group compatibility
MARK
Marks ASM allocation units as stale when needed
Onnn
One or more ASM slave processes forming a pool of connections to the ASM instance for exchanging messages
PZ9n
One or more parallel slave processes used in fetching data on clustered ASM installation from GV$ views
13. What is node listener?
In 11gr2 the listeners will run from Grid Infrastructure software home
  • The node listener is a process that helps establish network connections from ASM clients to the ASM instance.
  • Runs by default from the Grid $ORACLE_HOME/bin directory
  • Listens on port 1521 by default
  • Is the same as a database instance listener
  • Is capable of listening for all database instances on the same machine in addition to the ASM instance
  • Can run concurrently with separate database listeners or be replaced by a separate database listener
  • Is named tnslsnr on the Linux platform
15. What is SCAN listener?
A scan listener is something that additional to node listener which listens the incoming db connection requests from the client which got through the scan IP, it got end points configured to node listener where it routes the db connection requests to particular node listener.
16. What is the difference between CRSCTL and SRVCTL?
crsctl manages clusterware-related operations:
  • Starting and stopping Oracle Clusterware
  • Enabling and disabling Oracle Clusterware daemons
  • Registering cluster resources
srvctl manages Oracle resource–related operations:
  • Starting and stopping database instances and services
  • Also from 11gR2 manages the cluster resources like network,vip,disks etc
17. How to control Oracle Cluster-ware?
To start or stop Oracle Cluster-ware on a specific node:
# crsctl stop crs
# crsctl start crs
To enable or disable Oracle Cluster-ware on a specific node:
# crsctl enable crs
# crsctl disable crs
19. How to check the cluster (all nodes) status?
To check the viability of Cluster Synchronization Services (CSS) across nodes:
$ crsctl check cluster
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
20. How to check the cluster (one node) status?
$ crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
21. How to find Voting Disk location?
•To determine the location of the voting disk:
# crsctl query css votedisk
## STATE File Universal Id File Name Disk group
– —– —————– ———- ———-
1. ONLINE 8c2e45d734c64f8abf9f136990f3daf8 (ASMDISK01) [DATA]
2. ONLINE 99bc153df3b84fb4bf071d916089fd4a (ASMDISK02) [DATA]
3. ONLINE 0b090b6b19154fc1bf5913bc70340921 (ASMDISK03) [DATA]
Located 3 voting disk(s).
22. How to find Location of OCR?
  • cat /etc/oracle/ocr.loc
ocrconfig_loc=+DATA
local_only=FALSE
  • #OCRCHECK (also about OCR integrity)
23. List some background process that used in ASM Instances?
Process
Description
RBAL
Opens all device files as part of discovery and coordinates the rebalance activity
ARBn
One or more slave processes that do the rebalance activity
GMON
Responsible for managing the disk-level activities such as drop or offline and advancing the ASM disk group compatibility
MARK
Marks ASM allocation units as stale when needed
Onnn
One or more ASM slave processes forming a pool of connections to the ASM instance for exchanging messages
PZ9n
One or more parallel slave processes used in fetching data on clustered ASM installation from GV$ views
24. What are types of ASM Mirroring?
Disk Group Type
Supported Mirroring Levels
Default Mirroring Level
External redundancy
Unprotected (None)
Unprotected (None)
Normal redundancy
Two-wayThree-wayUnprotected (None)
Two-way
High redundancy
Three-way
Three-way
25. What is ASM Striping?
ASM can use variable size data extents to support larger files, reduce memory requirements, and improve performance.
Each data extent resides on an individual disk.
Data extents consist of one or more allocation units.
The data extent size is:
  • Equal to AU for the first 20,000 extents (0–19999)
  • Equal to 4 × AU for the next 20,000 extents (20000–39999)
  • Equal to 16 × AU for extents above 40,000
ASM stripes files using extents with a coarse method for load balancing or a fine method to reduce latency.
  • Coarse-grained striping is always equal to the effective AU size.
  • Fine-grained striping is always equal to 128 KB.
26. How many ASM Disk-groups can be created under one ASM Instance?
ASM imposes the following limits:
  • 63 disk groups in a storage system
  • 10,000 ASM disks in a storage system
  • Two-terabyte maximum storage for each ASM disk (non-Exadata)
  • Four-petabyte maximum storage for each ASM disk (Exadata)
  • 40-exabyte maximum storage for each storage system
  • 1 million files for each disk group
  • ASM file size limits (database limit is 128 TB):
  1. External redundancy maximum file size is 140 PB.
  2. Normal redundancy maximum file size is 42 PB.
  3. High redundancy maximum file size is 15 PB.
27. How to find the cluster network settings?
To determine the list of interfaces available to the cluster:
$ oifcfg iflist –p -n
To determine the public and private interfaces that have been configured:
$ oifcfg getif
eth0 192.0.2.0 global public
eth1 192.168.1.0 global cluster_interconnect
To determine the Virtual IP (VIP) host name, VIP address, VIP subnet mask, and VIP interface name:
$ srvctl config nodeapps -a
VIP exists.:host01
VIP exists.: /192.0.2.247/192.0.2.247/255.255.255.0/eth0

28. How to change Public or VIP Address in RAC Cluster?
29. How to change Cluster interconnect in RAC?
On a single node in the cluster, add the new global interface specification:
$ oifcfg setif -global eth2/192.0.2.0:cluster_interconnect
Verify the changes with oifcfg getif and then stop Clusterware on all nodes by running the following command as root on each node:
# oifcfg getif
# crsctl stop crs
Assign the network address to the new network adapters on all nodes using ifconfig:
#ifconfig eth2 192.0.2.15 netmask 255.255.255.0 broadcast 192.0.2.255
Remove the former adapter/subnet specification and restart Clusterware:
$ oifcfgdelif -global eth1/192.168.1.0
# crsctl start crs
30. Managing or Modifying SCAN in Oracle RAC?
To add a SCAN VIP resource:
$ srvctl add scan -n cluster01-scan
To remove Clusterware resources from SCAN VIPs:
$ srvctl remove scan [-f]
To add a SCAN listener resource:
$ srvctl add scan_listener
$ srvctl add scan_listener -p 1521
To remove Clusterware resources from all SCAN listeners:
$ srvctl remove scan_listener [-f]
31. How to check the node connectivity in Oracle Grid Infrastructure?
$ cluvfy comp nodecon -n all –verbose
32. Can I stop all nodes in one command? Meaning that stopping whole cluster?
In 10g its not possible, where in 11g it is possible
[root@pic1]# crsctl start cluster -all
[root@pic2]# crsctl stop cluster –all
33. What is OLR? Which of the following statements regarding the Oracle Local Registry (OLR) is true?
1. Each cluster node has a local registry for node-specific resources.
2. The OLR should be manually created after installing Grid Infrastructure on each node in the cluster.
3. One of its functions is to facilitate Cluster-ware startup in situations where the ASM stores the OCR and voting disks.
4. You can check the status of the OLR using ocr-check.
34. What is runfixup.sh script in Oracle Cluster-ware 11g release 2 installations?
With Oracle Cluster-ware 11g release 2, Oracle Universal Installer (OUI) detects when the minimum requirements for an installation are not met, and creates shell scripts, called fixup scripts, to finish incomplete system configuration steps. If OUI detects an incomplete task, then it generates fixup scripts (runfixup.sh). You can run the fixup script after you click the Fix and Check Again Button.
The Fixup script does the following:
If necessary sets kernel parameters to values required for successful installation, including:
  • Shared memory parameters.
  • Open file descriptor and UDP send/receive parameters.
Sets permissions on the Oracle Inventory (central inventory) directory. Reconfigures primary and secondary group memberships for the installation owner, if necessary, for the Oracle Inventory directory and the operating system privileges groups.
  • Sets shell limits if necessary to required values.
35. How to stop whole cluster with single command
crsctl stop cluster (possible only from 11gr2), please note crsctl commands becomes global now, if you do not specify node specifically the command executed globally for example
crsctl stop crs (stops in all crs resource in all nodes)
crsctl stop crs –n <ndeoname) (stops only in specified node)
36. CRS is not starting automatically after a node reboot, what you do to make it happen?
crsctl enable crs (as root)
to disable
crsctl disable crs (as root)
37. What are server pools in 11gr2?
38. What is policy managed databases in RAC?
39. What is Load balancing & how does it work?
40. Describe high level Steps to convert single instance to RAC?
41. What is the difference between TAF and FAN & FCF? at what conditions you use them?
1) TAF with tnsnames
a feature of Oracle Net Services for OCI8 clients. TAF is transparent application failover which will move a session to a backup connection if the session fails. With Oracle 10g Release 2, you can define the TAF policy on the service using dbms_service package. It will only work with OCI clients. It will only move the session and if the parameter is set, it will failover the select statement. For insert, update or delete transactions, the application must be TAF aware and roll back the transaction. YES, you should enable FCF on your OCI client when you use TAF, it will make the failover faster.
Note: TAF will not work with JDBC thin.

2) FAN with tnsnames with aq notifications true
FAN is a feature of Oracle RAC which stands for Fast Application Notification. This allows the database to notify the client of any change (Node up/down, instance up/down, database up/down). For integrated clients, inflight transactions are interrupted and an error message is returned. Inactive connections are terminated.
FCF is the client feature for Oracle Clients that have integrated with FAN to provide fast failover for connections. Oracle JDBC Implicit Connection Cache, Oracle Data Provider for .NET (ODP.NET) and Oracle Call Interface are all integrated clients which provide the Fast Connection Failover feature.
3) FCF, along with FAN when using connection pools
FCF is a feature of Oracle clients that are integrated to receive FAN events and abort inflight transactions, clean up connections when a down event is received as well as create new connections when a up event is received. Tomcat or JBOSS can take advantage of FCF if the Oracle connection pool is used underneath. This can be either UCP (Universal Connection Pool for JAVA) or ICC (JDBC Implicit Connection Cache). UCP is recommended as ICC will be deprecated in a future release.
4) ONS, with clusterware either FAN/FCF
ONS is part of the clusterware and is used to propagate messages both between nodes and to application-tiers
ONS is the foundation for FAN upon which is built FCF.
RAC uses FAN to publish configuration changes and LBA events. Applications can react as those published events in two way :
– by using ONS api (you need to program it)
– by using FCF (automatic by using JDBC implicit connection cache on the application server)
you can also respond to FAN event by using server-side callout but this on the server side (as their name suggests it
)
Relationship between FAN/FCF/ONS
ONS –> FAN –> FCF
ONS -> send/receive messages on local and remote nodes.
FAN -> uses ONS to notify other processes about changes in configuration of service level
FCF -> uses FAN information working with conection pools JAVA and others.

42. Can you add voting disk online? Do you need voting disk backup?
Yes,  as per documentation, if you have multiple voting disk you can add online, but if you have only one voting disk , by that cluster will be down as its lost you just need to start crs in exclusive mode and add the vote-disk using
crsctl add votedisk <path>
43. You have lost OCR disk, what is your next step?
The cluster stack will be down due to the fact that cssd is unable to maintain the integrity, this is true in 10g, From 11gR2 onwards, the crsd stack will be down, the hasd still up and running. You can add the ocr back by restoring the automatic backup or import the manual backup,
44. What happens when ocssd fails, what is node eviction? How node eviction does happen? For all answer will be same.
45. What is virtual IP and how does it works?
46. Describe some rac wait events you experienced?
Oracle RAC Wait events and this table,
image
47. Can you modify VIP address after your cluster installation?
Yes,
48. How do you interpret AWR report in RAC instances, what sections in awr report for rac instances are most important?
1. Viewing Contents in OCR/Voting disks
        There are three possible ways to view the OCR contents.
        a.      OCRDUMP (or)
        b.      crs_stat -p  (or)
        c.      By using strings.
        Voting disk contents are not persistent and are not required to view the contents, because the voting disk contents will be overwritten. if still need to view, strings are used.
2. Server pools – Read in my blog
3. Verifying Cluster Interconnect
               Cluster interconnects can be verified by:
        i.      oifcfg getif
        ii.     From AWR Report.
        iii.    show parameter cluster_interconnect
        iv.     srvctl config network
4. Does scan IP required or we can disable it
         SCAN IP can be disabled if not required. However SCAN IP is mandatory during the RAC installation. Enabling/disabling SCAN IP is mostly used in oracle apps environment by the concurrent manager (kind of job scheduler in oracle apps).
        To disable the SCAN IP,
        i.      Do not use SCAN IP at the client end.
        ii.     Stop scan listener
              srvctl stop scan_listener
        iii.    Stop scan
              srvctl stop scan (this will stop the scan vip's)
        iv.     Disable scan and disable scan listener
             srvctl disable scan
5. Migrating to new Disk-group scenarious
a.      Case 1: Migrating disk group from one storage to other with same name
        1. Consider the disk group is DATA,
        2. Create new disks in DATA pointing towards the new storage (EMC),
                  a) Partioning provisioning done by storage and they give you the device name or mapper like /dev/mapper/asakljdlas
        3. Add the new disk to diskgroup DATA
                 a) Alter diskgroup data add disk '/dev/mapper/asakljdlas'
        3. drop the old disks from DATA with which rebalancing is done automatically.
        If you want you can the rebalance by alter system set asm_power_limit =12 for full throttle.
            alter diskgroup data drop disk 'path to hitachi storage'
            Note: you can get the device name in v$asm_disk in path column.
        4. Request SAN team to detach the old Storage (HITACHI).
 
b.      Case 2: Migrating disk group from one to another with different diskgroup name.
        1) Create the Disk group with new name in the new storage.
        2) Create the spfile in new diskgroup and change the parameter scope = spfile for control files etc.
        3) Take a control file backup in format +newdiskgroup
        4) Shutdown the db, startup nomount the database
        5) restore the control file from backup (now the control will restore to new diskgroup)
        6) Take the RMAN backup as copy of all the databases with new format.
               RMAN&gt; backup database as copy format '+newdiskgroup name' ;
        3) RMAN&gt; Switch database to copy.
        4) Verify dba_data_files,dba_temp_files, v$log that all files are pointing to new diskgroup name.
 
c.      Case 3: Migrating disk group to new storage but no additional diskgroup given
         1) Take the RMAN backup as copy of all the databases with new format and place it in the disk.
         2) Prepare rename commands from v$log ,v$datafile etc (dynamic queries)
         3) Take a backup of pfile and modify the following referring to new diskgroup name
                  .control_files
               .db_create_file_dest
               .db_create_online_log_dest_1
               .db_create_online_log_dest_2
               .db_recovery_file_des
                          4) stop the database
                          5) Unmount the diskgroup
                               asmcmd umount ORA_DATA
                          6) use asmcmd renamedg (11gr2 only) command to rename to new diskgroup
                               renamedg phase=both dgname=ORA_DATA newdgname=NEW_DATA verbose=true
                 7)  mount the diskgroup
                      asmcmd mount NEW_DATA
                 8) start the database in mount with new pfile taken backup in step 3
                 9) Run the rename file scripts generated at step2
                 9) Add the diskgroup to cluster the cluster (if using rac)
               srvctl modify database -d orcl -p +NEW_FRA/orcl/spfileorcl.ora
                       srvctl modify database -d orcl -a "NEW_DATA"
                       srvctl config database -d orcl
                       srvctl start database -d orcl
                                        10) Delete the old diskgroup from cluster
                        crsctl delete resource ora.ORA_DATA.dg
                11) Open the database.
7. Database rename in RAC, what could be the checklist for you?
        a.      Take the outputs of all the services that are running on the databases.
        b.      set cluster_database=FALSE
        c.      Drop all the services associated with the database.
        d.      Stop the database
        e.      Startup mount
        f.      Use nid to change the DB Name.
                 Generic question, If using ASM the usual location for the datafile would be +DATA/datafile/OLDDBNAME/system01.dbf'
                 Does NID changes this path too? to reflect the new db name?
                 Yes it will, by using proper directory structure it will create a links to original directory structure. +DATA/datafile/NEWDBNAME/system01.dbf'
                 this has to be tested,  We dont have test bed, but thanks to Anji who confirmed it will
 
        g.      Change the parameters according to the new database name
        h.      Change the password file.
        i.      Stop the database.
        j.      Mount the database
        k.      Open database with Reset logs
        l.      Create spfile from pfile.
        m.      Add database to the cluster.
        n.      Create the services that are dropped in prior to rename.
        o.      Bounce the database.
8.How to find the database in which particular service is attached to when you have a large number of databases running in the server, you cannot check one by one manually
Write a shell script to read the database name from oratab and iterate the loop taking inpt as DB name in srvctl to get the result.
#!/bin/ksh
ORACLE_HOME=<crs_home>
PATH=$ORACLE_HOME/bin:$PATH
LD_LIBRARY_PATH=${SAVE_LLP}:${ORACLE_HOME}/lib
export TNS_ADMIN ORACLE_HOME PATH LD_LIBRARY_PATH
for INSTANCE in `cat /etc/oratab|grep -v “^#”|cut -f1 -d: -s`
do
export ORACLE_SID=$INSTANCE
echo `srvctl status service -d $INSTANCE -s $1| grep -i “is running”`
done
9. Difference between OHAS and CRS?

OHAS is complete cluster stack which includes some kernel level tasks like managing network, time synchronization, disks etc, where the CRS has the ability to manage the resources like database,listeners,applications, etc With both of this Oracle provides the high availability clustering services rather only affinity to databases.

ORA-08004: sequence IEX_DEL_BUFFERS_S.NEXTVAL exceeds MAXVALUE

 Error:- IEX: Scoring Engine Harness Error - ORA-08004: sequence IEX_DEL_BUFFERS_S.NEXTVAL exceeds MAXVALUE (Doc ID 2056754.1) SYMPTOMS:- Yo...