Backup and Recovery in 11gr2 RAC Database

To protect your Oracle Real Application Clusters (Oracle RAC) database from hardware failures or disasters, you must have a physical copy of the database files. The files protected by the backup and recovery facilities built into Oracle Enterprise Manager include data files, control files, server parameter files (SPFILEs), and archived redo log files. Using these files, your database can be reconstructed. The backup mechanisms that work at the physical level protect against damage at the file level, such as the accidental deletion of a data file or the failure of a disk drive. Database recovery involves restoring, or copying, the damaged files from backup and performing media recovery on the restored files. Media recovery is the application of redo logs or incremental backups to a restored data file to update it to the current time or some other specified time.
The Oracle Database flashback features, such as Oracle Flashback Drop and Oracle Flashback Table, provide a range of physical and logical data recovery tools as efficient, easy-to-use alternatives to physical and logical backup operations. The flashback features enable you to reverse the effects of unwanted database changes without restoring data files from backup or performing media recovery.
The Enterprise Manager physical backup and recovery features are built on the Recovery Manager (RMAN) command-line client. Enterprise Manager makes available many of the RMAN features, and provides wizards and automatic strategies to simplify and further automate RMAN-based backup and recovery.
Note:
For the RMAN utility to work properly on Linux platforms, 
the $ORACLE_HOME/bin directory must appear in the PATH variable before the/usr/X11R6/bin directory.
The Enterprise Manager Guided Recovery capability provides a Recovery Wizard that encapsulates the logic required for a wide range of file restoration and recovery scenarios, including the following:
  • Complete restoration and recovery of the database
  • Point-in-time recovery of the database or selected tablespaces
  • Flashback Database
  • Other flashback features of Oracle Database for logical-level repair of unwanted changes to database objects
  • Media recovery at the block level for data files with corrupt blocks
If the database files are damaged or need recovery, then Enterprise Manager can determine which parts of the database must be restored from a backup and recovered, including early detection of situations such as corrupted database files. Enterprise Manager guides you through the recovery process, prompting for needed information and performing the required recovery actions.

Fast Recovery Area in Oracle RAC

Using a fast recovery area minimizes the need to manually manage disk space for your backup-related files and balance the use of space among the different types of files. Oracle recommends that you enable a fast recovery area to simplify your backup management.
The larger the fast recovery area is, the more useful it becomes. Ideally, the fast recovery area should be large enough to contain all the following files:
  • A copy of all data files
  • Incremental backups
  • Online redo logs
  • Archived redo log files that have not yet been backed up
  • Control files and control file copies
  • Autobackups of the control file and database initialization parameter file
The fast recovery area for an Oracle RAC database must be placed on an Oracle ASM disk group, a cluster file system, or on a shared directory that is configured through a network file system file for each Oracle RAC instance. In other words, the fast recovery area must be shared among all of the instances of an Oracle RAC database. The preferred configuration for Oracle RAC is to use Oracle Automatic Storage Management (Oracle ASM) for storing the fast recovery area, using a different disk group for your recovery set than for your data files.
The location and disk quota must be the same on all instances. Oracle recommends that you place the fast recovery area on the shared Oracle ASM disks. In addition, you must set the DB_RECOVERY_FILE_DEST and DB_RECOVERY_FILE_DEST_SIZE parameters to the same values on all instances.
To use the fast recovery feature, you must first configure the fast recovery area for each instance in your Oracle RAC database.

Archiving the Oracle Real Application Clusters Database Redo Logs

To make your data highly available, it is important to configure the database so you can recover your data after a system failure. Redo logs contain a record of changes that were made to datafiles. Redo logs are stored in redo log groups, and you must have at least two redo log groups for your database.
After the redo log files in a group have filled up, the log writer process (LGWR) switches the writing of redo records to a new redo log group. Oracle Database can automatically save the inactive group of redo log files to one or more offline destinations, known collectively as the archived redo log (also called the archive log). The process of turning redo log files into archived redo log files is called archiving.
When you archive your redo log, you write redo log files to another location before they are overwritten. This location is called the archived redo log. These copies of redo log files extend the amount of redo data that can be saved and used for recovery. Archiving can be either enabled or disabled for the database, but Oracle recommends that you enable archiving.

 Archived Redo Log Files for an Oracle RAC Database

When you use Oracle Database Configuration Assistant (DBCA) to create your Oracle Real Application Clusters (Oracle RAC) database, each instance is configured with at least two redo log files that are stored in the shared storage area. If you have a two-node Oracle RAC database, then at least four redo logs are created for the database, two for each instance.
If you use a cluster file system to store the archived redo log files for your Oracle RAC database, then the redo log files are shared file system files. If you use Oracle ASM to store the archived redo log files for your Oracle RAC database, then each instance automatically has access to all the archived redo log files generated by the database. If you use shared storage or raw devices to store the archived redo log files on each node, then you must configure the operating system to grant access to those directories for each instance of the cluster database that needs access to them.
The primary consideration when configuring archiving is to ensure that all archived redo logs can be read from every node during recovery, and if possible during backups. During recovery, because the archived log destinations are visible from the node that performs the recovery, Oracle RAC can successfully recover the archived redo log data. For creating backups of your Oracle RAC database, the strategy that you choose depends on how you configure the archiving destinations for each node. Whether only one node or all nodes perform archived redo log backups, you must ensure that the archived redo logs for every instance are backed up.
To backup the archived redo logs from a single node, that node must have access to the archived log files of the other instances. The archived redo log naming scheme that you use is important because when a node writes to a log with a specific filename on its file system, the file must be readable by any node that must access this archived redo log. For example, if node1 archives a log to /oracle/arc_dest/log_1_100_23452345.arc, then node2 can back up this archived redo log only if it can read/oracle/arc_dest/log_1_100_23452345.arc on its own file system.

Parallelism and Backups Across Multiple RMAN Channels

Recovery Manager (RMAN) depends on server sessions, processes that run on the database server, to perform backup and recovery tasks. Each server session in turn corresponds to an RMAN channel, representing one stream of data to or from a backup device. RMAN supports parallelism, which is the use of multiple channels and server sessions to perform the work of a single backup job or file restoration task.
Because the control file, SPFILE, and data files are accessible by any instance, the backup operation of these files is distributed across all the allocated channels. For backups of the archived redo log, the actions performed by RMAN depend on the type of archiving scheme used by your Oracle RAC database.
If you use a local archiving scheme, then each instance writes the archived redo log files to a local directory. When multiple channels are allocated that have access to the archived redo log, for each archived redo log file, RMAN determines which channels have access to that archived redo log file. Then, RMAN groups the archived redo log files that can be accessed by a channel and schedules a backup job using that channel.
If each node in the cluster writes the archived redo log files to Oracle ASM, a clustered file system, or other type of shared storage, then each instance has access to all the archived redo log files. In this case, the backup of the archived redo log is distributed across all the allocated channels.

Configuring Archiving for Your Oracle RAC Database

For Oracle RAC, each instance has its own thread of redo. The preferred configuration for Oracle RAC is to configure the fast recovery area using an Oracle ASM disk group that is separate from the Oracle ASM disk group used for your data files. Alternatively, you can use a cluster file system archiving scheme.
To configure archiving for your Oracle RAC database:
  1. On the Database Home page of Enterprise Manager Database Control, while logged in as a SYSDBA user, select Availability.
    The Availability subpage appears.
  2. In the Backup/Recovery section, under the heading Setup, select Recovery Settings.
    The Recovery Settings page appears.
  3. In the Media Recovery section, select the ARCHIVELOG mode option.
  4. In the Log Archive Filename Format field, accept the default value, or enter the desired format.
    For clustered databases, the format for the archive log file name should contain the %t modifier, to indicate which redo log thread the archived redo log file belongs to. As a best practice, the format for the archive log file name should also include the %s (log sequence number) and %r (resetlogs identifier) modifiers.
  5. If the archive log destination is the same for all instances, then in the Archive Log Destination field, change the value to the location of the archive log destination for the cluster database.
    For example, you might set it to +DATA if using Oracle ASM, or to /u01/oradata/arch if you want local archiving on each node.
    If you must configure a different archive log destination for any instance, then you must go to the Initialization Parameters page and modify the LOG_ARCHIVE_DEST_1 parameter that corresponds to the instance for which you want to configure the archive log destination. The Instance column should display the name of the instance, for example sales1. Change the Value field to contain the location of the archive log destination for that instance.
  6. If you want to configure multiple archive log destinations for the database, then on the Recovery Settings page, click Add Another Rowunder the Archive Log Destination field.
  7. After you have finished configuring archiving, click Apply.
    When prompted to restart the database, click Yes.
  8. Enter the host and SYSDBA user credentials, then click Continue.
  9. Wait a couple of minutes, then click Refresh.
    If the database has been restarted, then you are prompted to enter the login credentials.

Configure RMAN to use all RAC nodes (Parallel backups)

Consider a two-node rac with 1 cpu per node.

CONFIGURE DEVICE TYPE DISK BACKUP TYPE TO COMPRESSED BACKUPSET PARALLELISM 2;

CONFIGURE CHANNEL 1 DEVICE TYPE DISK CONNECT 'sys/oracle@rt1';
CONFIGURE CHANNEL 2 DEVICE TYPE DISK CONNECT 'sys/oracle@rt2';

RMAN> show all;
RMAN configuration parameters are:
CONFIGURE RETENTION POLICY TO REDUNDANCY 1; # default
CONFIGURE BACKUP OPTIMIZATION OFF; # default
CONFIGURE DEFAULT DEVICE TYPE TO DISK; # default
CONFIGURE CONTROLFILE AUTOBACKUP OFF; # default
CONFIGURE CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE DISK TO '%F'; # default
CONFIGURE DEVICE TYPE DISK BACKUP TYPE TO COMPRESSED BACKUPSET PARALLELISM 2;
CONFIGURE DATAFILE BACKUP COPIES FOR DEVICE TYPE DISK TO 1; # default
CONFIGURE ARCHIVELOG BACKUP COPIES FOR DEVICE TYPE DISK TO 1; # default
CONFIGURE CHANNEL 1 DEVICE TYPE DISK CONNECT '*';
CONFIGURE CHANNEL 2 DEVICE TYPE DISK CONNECT '*';
CONFIGURE MAXSETSIZE TO UNLIMITED; # default
CONFIGURE ENCRYPTION FOR DATABASE OFF; # default
CONFIGURE ENCRYPTION ALGORITHM 'AES128'; # default
CONFIGURE ARCHIVELOG DELETION POLICY TO NONE; # default
CONFIGURE SNAPSHOT CONTROLFILE NAME TO '/oracle/app/oracle/product/10.2.0/db_1/dbs/snapcf_RT2.f'; # default

RMAN> backup database plus archivelog;
Starting backup at 04-JUL-11
current log archived
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=143 instance=RT1 devtype=DISK
allocated channel: ORA_DISK_2
channel ORA_DISK_2: sid=132 instance=RT2 devtype=DISK
channel ORA_DISK_1: starting compressed archive log backupset
channel ORA_DISK_1: specifying archive log(s) in backup set
input archive log thread=1 sequence=10 recid=6 stamp=755647649
input archive log thread=1 sequence=11 recid=10 stamp=755648045
input archive log thread=1 sequence=12 recid=14 stamp=755648207
input archive log thread=1 sequence=13 recid=16 stamp=755648344
input archive log thread=1 sequence=14 recid=22 stamp=755648490
input archive log thread=1 sequence=15 recid=26 stamp=755648538
input archive log thread=1 sequence=16 recid=30 stamp=755648596
input archive log thread=2 sequence=12 recid=2 stamp=755647601
channel ORA_DISK_1: starting piece 1 at 04-JUL-11
channel ORA_DISK_2: starting compressed archive log backupset
channel ORA_DISK_2: specifying archive log(s) in backup set
input archive log thread=2 sequence=13 recid=4 stamp=755647644
input archive log thread=2 sequence=14 recid=8 stamp=755648043
input archive log thread=2 sequence=15 recid=12 stamp=755648206
input archive log thread=2 sequence=16 recid=18 stamp=755648344
input archive log thread=2 sequence=17 recid=20 stamp=755648488
input archive log thread=2 sequence=18 recid=24 stamp=755648536
input archive log thread=2 sequence=19 recid=28 stamp=755648594
channel ORA_DISK_2: starting piece 1 at 04-JUL-11
channel ORA_DISK_1: finished piece 1 at 04-JUL-11
piece handle=+DISKGROUP01/rt/backupset/2011_07_04/annnf0_tag20110704t222317_0.314.755648601 tag=TAG20110704T222317 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:07
channel ORA_DISK_2: finished piece 1 at 04-JUL-11
piece handle=+DISKGROUP01/rt/backupset/2011_07_04/annnf0_tag20110704t222317_0.315.755648603 tag=TAG20110704T222317 comment=NONE
channel ORA_DISK_2: backup set complete, elapsed time: 00:00:07
Finished backup at 04-JUL-11
Starting backup at 04-JUL-11
using channel ORA_DISK_2
using channel ORA_DISK_1
....

RMAN Utility

Whether you are a single node Oracle DBA or a multi-node RAC farm DBA, you will be, as I mentioned above, confronted with all kinds of alternatives. Sticking to RMAN is the best choice. Why? Whether you use raw partitions or cooked partitions (OCFS, NFS), RMAN does all of the backup by accessing the datafiles via the memory buffer. This ensures that every block is read and written, so no data is lost and there is no chance of error.
Although RMAN is a backup solution and it does not matter if you are a RAC or a single node database, still there might be some little things to watch out for in a RAC database.

Adequate RMAN configuration for RAC

Every backup plan has a goal, which is to provide a speedy recovery. In addition, depending on what your strategy is, you need to configure your RAC accordingly. RMAN needs to make a dedicated connection, unlike a typical client, through the regular ONS (Oracle Net Services), which means all it needs is one node of the cluster. Moreover, you can allocate channels at each node of your RAC by specifying a command as such:
allocate channel x1 type sbt connect sys/password@nickrac01;
allocate channel x2 type sbt connect sys/password@nickrac02;
allocate channel x3 type sbt connect sys/password@nickrac03;
allocate channel x4 type sbt connect sys/password@nickrac04;
That way you can distribute the workload across nodes without doing all of the I/O intensive job via that connected node. Again, it depends on your architectural setup and backup policy. It could very well be that you want to do it via an nth node, which has an additional 10 Gbps card connected to your SAN and this node happens to be supporting a typical DSS system, which is not under stress during the schedule job or on an OLTP environment where that particular node purely services backup. Just an example scenario, but as I mentioned, discuss it thoroughly with your system Admins and SAN admin (should you have a SAN that is) before working on your backup strategy. While some environments may respond well to a distributed RMAN activity (a typical non-24x7 environment comes to mind), a single node might be best for a heavy 24/7 OLTP environment.

What about snapshot controlfile?

Having the controlfile on a shared system, like the 1GB OCFS files we created for the OCR, votingdisks or spfileasm, would make more sense, if you need to run backup from multiple nodes. Alternatively, you could have the same local file destination on all nodes. How to see the snapshot file location:
rman> show snapshot controlfile name;

Backing up datafiles

You may back up your datafiles from any node since they are on the ASM or OCFS system anyway, which means they are on a shared storage. So this is expected to go smoothly, regardless of the node from where the RMAN. Be aware of the tape scenario if you still use a tape back. Check the configurations of your 3rd party SBT interface and the media management layer, or, if backing up to a disk, which is sharable by all nodes or a zone on SAN where all HBAs from all nodes are connected in case of a single node failure. There are several ways of providing backup space.

Backing up Archivelog files

There are several ways of backing up archivelog files as well. Archive logs are better placed on a shared cluster such as an OCFS volume dedicated to archive log files of all nodes, which makes it easier for the RMAN to connect to the shared volume and backup those files from there. Then depending upon your architectural decision, you can either allocate a channel per node or use one node. The same thing can be done on NFS volumes and have all of the archive logs backed up with RMAN from there. Alternatively, you can also archive the files on DAS (Direct Attached Storage) or local disks but as I said, it is the recovery we are talking about and not just merely the backup. This DAS archive log setup might be extremely cumbersome during recovery. Setting it up might sound similar to the above strategies, like allocating channels at individual nodes to perform a backup of the archivelog file.

What about FRA (Flashback Recovery Area)

In a typical single node environment it might sound a little overdone to have the flashback reco area, but it is important, no doubt about it. In RAC, it is even more important. When we go about installing our database (we will come to that as well in the next articles, as many of our readers have asked us for VMware with Oracle Linux installation) we are given the option of choosing an area. It is very wise to choose an OCFS formatted drive for your flashback recovery area; you can also choose ASM. With FRA, you can have your backs reachable to all nodes should the node where you are operating from, fails. It is this simple design that will help you with a seamless recovery.

 RMAN Full Database backup in RAC-11gR2 environment.
RMAN backup in Oracle 11gR2 RAC is exactly same like RMAN backup in Oracle 11gR2 single node.
The only difference is: Typically, in case of Oracle single node database, we will schedule RMAN scripts with the help of CRON job and it will run according to our convenience, but in case of Oracle RAC if we schedule RMAN script and if unfortunately that RAC node goes down ( where we configured RMAN scripts ), then RMAN backup won’t run obviously.
So, Same strategy will not be work in Oracle RAC node. For RMAN consistent backups use dbms_scheduler & we need to place RMAN scripts in shared directory. ( Or in my case, I have created identical scripts on both cluster node’s )
Note:
  • In my case I am using FRA for backup as well as snapshot controlfile.
  • ACFS mounted directory will also do for the same configuration.
Before starting configuration you need to ready your RMAN full database backup script.
In my case: I am calling “Weekly_full.rcv” RMAN script through “Weekly_full.sh” shell executable file after setting Oracle environment variables.
Note: Same script has been copied to all possible Oracle Cluster Node with same name & path.
[oracle@database1 Scripts]$ cat Weekly_full.sh
#!/bin/bash
#Script to run weekly full backup.
#Created 17 June 2014
#Author : Jignesh Jethwa
#Version 1.0
#Set Environment Variable
export ORACLE_BASE=/u01/app/oracle;
export ORACLE_SID=kucb1
export ORACLE_HOME=/u01/app/oracle/product/11.2.0/db_1;
export PATH=$PATH:$ORACLE_HOME/bin
rman target / @/home/oracle/RMAN/Scripts/Weekly_full.rcv
exit
RMAN Full database script: ( .rcv )
[oracle@database1 Scripts]$ cat Weekly_full.rcv
#Script to run weekly full backup
#Created 17 June 2014
#Author : Jignesh Jethwa
#Version 1.0
run
{
#Database Full Backup
backup as compressed backupset full database plus archivelog;
delete noprompt archivelog until time 'sysdate - 1' backed up 1 times to device type disk;
#Delete Obsolte Backups
delete noprompt force obsolete;
}
Note:
Don’t forget to allow executable permissions to RMAN & Shell scripts ( i.e. .sh & .rcv files )
Lets start with the configuration part:
Logged in to RMAN prompt and issue the following commands in order to optimize RMAN backup & automatic controlfile backup after every backup.
RMAN> CONFIGURE BACKUP OPTIMIZATION ON;
new RMAN configuration parameters:
CONFIGURE BACKUP OPTIMIZATION ON;
new RMAN configuration parameters are successfully stored
RMAN> CONFIGURE CONTROLFILE AUTOBACKUP ON;
old RMAN configuration parameters:
CONFIGURE CONTROLFILE AUTOBACKUP OFF;
new RMAN configuration parameters:
CONFIGURE CONTROLFILE AUTOBACKUP ON;
new RMAN configuration parameters are successfully stored
DBMS_SCHEDULER:
Here we are using DBMS_SCHEDULER instead of DBMS_JOB, because DBMS_SCHEDULER is RAC aware.
Before jump into real DBMS_SCHEDULER configuration, we need to focus on an important thing, That:
Both RAC nodes local time zone must be identical with DBMS_SCHEDULER default time.
On all RAC node, Ensure local time zone and set it accordingly.
ln -s /usr/share/zoneinfo/Asia/Calcutta localtime
On SQL prompt as sysdba, configure default time zone for DBMS_SCHEDULER.
SQL> exec dbms_scheduler.set_scheduler_attribute ('DEFAULT_TIMEZONE', 'Asia/Calcutta');
PL/SQL procedure successfully completed.
Ensure default DBMS_SCHEDULER time zone value with following query:
SQL> select value from dba_scheduler_global_attribute where attribute_name = 'DEFAULT_TIMEZONE';
VALUE
--------------------------------------------------------------------------------
Asia/Calcutta
Now we need to create credential so that are assigned to DBMS_SCHEDULER jobs so that they can authenticate with a local/remote host operating system or a remote Oracle database.
SQL> exec dbms_scheduler.create_credential(credential_name => 'Oracle', username => 'Oracle', password => '********');
PL/SQL procedure successfully completed.
Now its time to create DBMS_SCHEDULER job for RMAN Full backup, Here in this procedure I am going to create “RMAN_FULL_BACKUP” job with required attributes.

begin
dbms_scheduler.create_job(
job_name            => 'RMAN_FULL_BACKUP',
job_type            => 'EXECUTABLE',
job_action          => '/bin/sh',
number_of_arguments => 1,
start_date          => SYSTIMESTAMP,
credential_name     => 'Oracle',
auto_drop           => FALSE,
enabled             => FALSE);
end;
/
PL/SQL procedure successfully completed.
Set argument_position & argument_value ( i.e. Path of the RMAN script ) for the same job:

begin
dbms_scheduler.set_job_argument_value(
job_name            => 'RMAN_FULL_BACKUP',
argument_position   =>  1,
argument_value      => '/home/oracle/RMAN/Scripts/Weekly_full.sh');
end;
/
PL/SQL procedure successfully completed.
Set start_date for the same job, In my case “RMAN_FULL_BACKUP” job will execute every week on sunday @11pm, so job start date and its first run timing would  according to my convenience.
begin
dbms_scheduler.set_attribute(
name      => 'RMAN_FULL_BACKUP',
attribute => 'start_date',
value     => trunc(sysdate)+23/24);
end;
/
PL/SQL procedure successfully completed.
Test your backup job manually in SQL prompt by instantiating “RMAN_FULL_BACKUP” job.
SQL> exec dbms_scheduler.run_job('RMAN_FULL_BACKUP');
PL/SQL procedure successfully completed.
Verify running RMAN backup status by issuing following SQL query, It will show you RMAN backup details with start time & end time.
select SESSION_KEY, INPUT_TYPE, STATUS,
to_char(START_TIME,'mm/dd/yy hh24:mi') start_time,
to_char(END_TIME,'mm/dd/yy hh24:mi') end_time,
elapsed_seconds/3600 hrs
from V$RMAN_BACKUP_JOB_DETAILS
order by session_key;
In case of any error while test run, you can make sure details of error by issuing the following query, OR You can also query to “dba_scheduler_job_run_details” dictionary view for more details.
select JOB_NAME,STATUS,STATE,ERROR#,CREDENTIAL_NAME from dba_scheduler_job_run_details where CREDENTIAL_NAME like 'RMAN%';
After successfully completion of test run, Enable & schedule it by following procedure by setting value to ‘repeat_interval’ parameter, In my case “RMAN_FULL_BACKUP” job will execute every week on Sunday @11pm.
begin
dbms_scheduler.set_attribute(
name      => 'RMAN_FULL_BACKUP',
attribute => 'repeat_interval',
value     => 'freq=daily;byday=sun;byhour=23');
dbms_scheduler.enable( 'RMAN_FULL_BACKUP' );
end;
/
PL/SQL procedure successfully completed.
Ensure dbms_scheduler job details by issuing the following query OR you can also query to “dba_scheduler_jobs”.
SQL> select job_name,enabled,owner, state from dba_scheduler_jobs where job_name in ('RMAN_FULL_BACKUP');
Keep your eye on behavior of dbms_scheduler job by issuing the following query:
SQL> select job_name,RUN_COUNT,LAST_START_DATE,NEXT_RUN_DATE from dba_scheduler_jobs where job_name in ('RMAN_FULL_BACKUP');
Important Note:
DBMS_SCHEDULER is smart enough to start backup on the node where the last backup was successfully executed.

No comments:

ORA-00059: Maximum Number Of DB_FILES Exceeded in 19C database

When I am adding datafile to my 19C database facing the below error. SQL> alter tablespace DATA  add datafile '/u01/data/data15.dbf...