Applies to:
Oracle Database - Enterprise Edition - Version 11.2.0.3 and laterOracle Solaris on SPARC (64-bit)
Symptoms
Database instance crashed due toORA-00445: background process "J000" did not start after 120 seconds
opiodr aborting process unknown ospid (2139) as a result of ORA-609
Cause
Seems that the system is short on memory resources after check1. GUDS - A Script for Gathering Solaris Performance Data (Doc ID 1285485.1)
The guds file data, ftp in binary mode to supportfiles.sun.com and put it in the /cores directory with the case number in the beginning of the filename.
*************************************
2. Collect Guds using these options
*************************************
# # ./guds_3_1 -q -X3 -c15 -i30 -n5 -w0 -T -H0 -L10 -r -s<SR#> -D/var/tmp -d "Change this comment to something useful about the performance at the time this collections was made."
Make sure 'guds' is run during the time that the problem is occurring.
Guds may be obtained from this document : GUDS - A Solaris Performance Data Gathering Script (Doc ID 1285485.1)
total_kmem_inuse is extraordinary high: 195427MB / 262144MB (74.5%)
Check buffers for high values. Highest buffer is: zfs_file_data_131072, value is : 194871558144
Solution
As total_kmem_inuse is extraordinary high: 195427MB / 262144MB (74.5%)Check buffers for high values. Highest buffer is: zfs_file_data_131072, value is : 194871558144
It seems that your system is short on memory resources. You may want to consider placing a limit on ZFS ARC Cache to release some your RAM back to system.
Please review and implement solution listed in bellow document.
Limit of around 4-8GB would be good starting point. Reboot is needed after making this change.
********************OR*******************
Applies to:
Oracle Database - Enterprise Edition - Version 11.2.0.1 to 12.1.0.1 [Release 11.2 to 12.1]CRM On Demand - Version N/A to N/A
IBM: Linux on System z
Linux x86-64
Linux x86
Symptoms
Errors are seen in the alert log relating to spawning of processes such as:
@ Checked for relevance on 17th Jan 2012
ORA-00445: background process "m001" did not start after 120 seconds
Incident details in: /opt/u01/app/oracle/diag/rdbms/incident/incdir_3721/db1_mmon_7417_i3721.trc
ERROR: Unable to normalize symbol name for the following short stack (at offset 2):
Tue Jun 21 03:03:06 2011
ORA-00445: background process "J003" did not start after 120 seconds
ORA-00445: background process "m001" did not start after 120 seconds
Incident details in: /opt/u01/app/oracle/diag/rdbms/incident/incdir_3721/db1_mmon_7417_i3721.trc
ERROR: Unable to normalize symbol name for the following short stack (at offset 2):
Tue Jun 21 03:03:06 2011
ORA-00445: background process "J003" did not start after 120 seconds
or
Waited for process W002 to initialize for 60 seconds
The system appears to be running very slowly and defunct processes can appear.
Changes
REDHAT 5 kernel 2.6.18-194.el5 #1 SMP Tue Mar 16Oracle 11.2.0.2 Single Instance
IBM: Linux on System z
Cause
Recent linux kernels have a feature called Address Space Layout Randomization (ASLR).ASLR is a feature that is activated by default on some of the newer linux distributions.
It is designed to load shared memory objects in random addresses.
In Oracle, multiple processes map a shared memory object at the same address across the processes.
With ASLR turned on Oracle cannot guarantee the availability of this shared memory address.
This conflict in the address space means that a process trying to attach a shared memory object to a specific address may not be able to do so, resulting in a failure in shmat subroutine.
However, on subsequent retry (using a new process) the shared memory attachment may work.
The result is a "random" set of failures in the alert log.
Solution
It should be noted that this problem has only been positively diagnosed in Redhat 5 and Oracle 11.2.0.2.It is also likely, as per unpublished BUG:8527473, that this issue will reproduce running on Generic Linux platforms running any Oracle 11.2.0.x. or 12.1.0.x on Redhat/OEL kernels which have ASLR.
This issue has been seen in both Single Instance and RAC environments.
ASLR also exists in SLES10 and SLES 11 kernels and by default ASLR is turned on. To date no problem has been seen on SuSE servers running Oracle but Novell confirm ASLR may cause problems.
You can verify whether ASLR is being used as follows:
# /sbin/sysctl -a | grep randomizekernel.randomize_va_space = 1
If the parameter is set to any value other than 0 then ASLR is in use.
On Redhat 5 to permanently disable ASLR.
add/modify this parameter in /etc/sysctl.conf
kernel.randomize_va_space=0
kernel.exec-shield=0
You need to reboot for kernel.exec-shield parameter to take effect.
Note that both kernel parameters are required for ASLR to be switched off.
There may be other reasons for a process failing to start, however, by switching ASLR off, you can quickly discount ASLR being the problem. More and more issues are being identified when ASLR is in operation.
No comments:
Post a Comment