暂无图片
暂无图片
1
暂无图片
暂无图片
暂无图片

Oracle NUMA

原创 手机用户02 2022-09-06
1510
To BottomTo Bottom

In this Document

Symptoms
Changes
Cause
Solution


APPLIES TO:

Oracle Database - Enterprise Edition - Version 11.2.0.4 and later
Information in this document applies to any platform.
There is no other change at storage or database level.


SYMPTOMS

  • Log file sync is visible in the AWR report after OS upgrade from OL6 to OL7
  • No change in DB load - or call/commit ratio
  • Log writer trace shows below:

    *** 2019-04-09 10:15:03.943
    Warning: log write elapsed time 1442ms, size 1KB

    *** 2019-04-09 10:15:09.443
    Warning: log write elapsed time 3236ms, size 80KB

    *** 2019-04-09 10:15:22.146
    Warning: log write elapsed time 1373ms, size 115KB

      

     



CHANGES

 OS was upgraded from OL 6.7 to OL 7.5

CAUSE


Linux option Automatic NUMA balancing, which is not present in OEL6, but in OEL7 it is enabled by default. (this applies to RHEL 7 as well - where the feature is enabled by default).

To determine if automatic NUMA memory balancing feature is enabled, run the following command on the database server:

(root) # sysctl -e kernel.numa_balancing
kernel.numa_balancing = 1


A value of "1" means the feature is enabled, a value of "0" means disabled.
 
High IO waits - situation similar to (EX39) NUMA-enabled database servers experience continuously high load or reduced performance after updating to Exadata 12.2.1.1.0 or higher. (Doc ID 2319324.1) - however system is not Exadata.
 

SOLUTION

Disable automatic NUMA memory balancing feature: 

kernel.numa_balancing = 0

 


---------------------------------------------------------

To BottomTo Bottom

In this Document

Description
Occurrence
Symptoms
Workaround
 Recommendation
 Caution
 Steps to disable NUMA
Patches
 Bugs caused due to NUMA being enabled
 Community Discussions

History
References


APPLIES TO:

Oracle Database - Standard Edition - Version 10.2.0.1 and later
Oracle Database Cloud Schema Service - Version N/A and later
Gen 1 Exadata Cloud at Customer (Oracle Exadata Database Cloud Machine) - Version N/A and later
Oracle Cloud Infrastructure - Database Service - Version N/A and later
Oracle Database Cloud Exadata Service - Version N/A and later
Information in this document applies to any platform.
Currency checked on Oct-23-2014

DESCRIPTION

Oracle NUMA (Non Uniform Memory Architecture) support can be used with large SMP multiprocessor environments with NUMA hardware. When enabled Oracle NUMA support facilitates efficient use of underlying NUMA hardware and may improve database performance.

Oracle NUMA support needs the right combination of hardware, operating system and Oracle version.

With 10.2.0.4 and 11.1.0.7 patchsets, Oracle NUMA support can be enabled on common Operating Systems like AIX, HP-UX, Solaris, Linux and Windows if the underlying hardware characteristic is NUMA.

When running an Oracle database with NUMA support in a NUMA capable environment, Oracle will by default detect if the hardware and operating system are NUMA capable and enable Oracle NUMA support. From 11gR2, Oracle NUMA support is disabled by default. Refer Note 864633.1 Enable Oracle NUMA support with Oracle Server Version 11gR2 for more information.

Some OS upgrades/patches may enable NUMA (For example on Linux NUMA is enabled with kernel release 2.6.9-67). Care should be taken before enabling NUMA support or leaving it on by default. Please see below under the Recommendation€ section. Contact your hardware vendor for recommendation and information on your system and operating system NUMA capabilities

OCCURRENCE

The symptoms described in the following section generally occur when:

  • Oracle database NUMA support is enabled
  • Both operating system and hardware are NUMA capable.

And :

  • Database workload is memory constrained (or apply too much memory pressure on a given NUMA memory pool)
  • Dynamic reconfiguration events change the characteristics of the hardware or partition (e.g number of CPUs, memory available).

Some issues are OS/hardware specific. See bugs caused by NUMA section below.

Dynamic reconfiguration events removing resources from NUMA system such as an entire cell and all its processors are not supported (Please review Note:761065.1)

SYMPTOMS

The problems manifest usually with crashes with internal errors including:

  • ORA-4031
  • ORA-600 with argument KSKRECONFIGNUMA2
  • ORA-600 with argument KSBASEND_INTERNAL
  • ORA-600 with argument KSMHEAP_ALLOC1
  • ORA-27302: FAILURE OCCURRED AT: SSKGXPCRE3


WORKAROUND

Recommendation

  • Customers who have their SLAs unaffected with NUMA enabled can continue to run with no changes.

  • It is strongly recommended to customers who want to enable NUMA do sufficient testing before going into production.

  • Apply all the bug fixes or patchset required for your Oracle database version. Fixes for all known NUMA issues in the Oracle database is available for download. Please review the known bugs section.

To disable NUMA consult the section "Steps to disable NUMA" covering the instructions to disable NUMA at the Oracle database level. To disable NUMA at the operating system or hardware level contact your hardware vendor.

Please review the "Caution" section below when disabling NUMA.

Caution

  • Disabling or enabling NUMA can change application performance.

  • It is strongly recommended to evaluate the performance after disabling or before enabling NUMA in a test environment.

  • Operating system and/or hardware configuration may need to be tuned or reconfigured when disabling Oracle NUMA support. Consult your hardware vendor for more information or recommendation

Steps to disable NUMA

  • Customers can download and apply patch for Bug 8199533 to disable NUMA support. This is a database patch and should be applied to the Database home. This patch is available for common platforms on 10.2.0.4 and 11.1.0.7 releases.

  • If you apply patch for Bug 8199533 then Oracle will no longer enabled NUMA support by default even if it detects a NUMA capable environment.

  • Oracle support does not recommended using init.ora parameter "_enable_NUMA_optimization" to disable NUMA. Customers should apply fix for Patch 8199533 to disable NUMA . The patch is rolling upgradeable.

  • This patch does not need to be applied to the ASM home. However if the same Oracle home is used for both RDBMS and ASM instances then this patch can be applied to the Oracle home.

To enable NUMA optimization after applying patch 8199533, set init.ora parameter _enable_NUMA_optimization=TRUE

PATCHES

Bugs caused due to NUMA being enabled

Below is the list of known bugs caused by Oracle NUMA support being enabled. It is important to note that customers may encounter these bugs **ONLY** if NUMA support is enabled.

If Customers apply the corresponding bug fix or disable NUMA using the patch for Patch 8199533 or at the hardware/OS layer, they would not encounter these issues.

BugBug DescriptionFixed release
Bug:5173642Data not read from cache on second execution with NUMA optimization enabled10.2.0.4 & 11.1.0.6
Bug 6868080ORA-4031 with NUMA10.2.0.5 & 11.1.0.7
Bug 6689903ORA-27302: FAILURE OCCURRED AT: SSKGXPCRE310.2.0.5
Bug 4414666ORA-600[KSMHEAP_ALLOC1] WHEN STARTING UP WITH NUMA_ON10.2.0.2  & 11.1.0.6
Bug 4173484OERI[17131] / instance crash with NUMA using SGA_TARGET10.1.0.5 & 10.2.0.1
Bug 3802438OERI[17148] / SGA heap corruption with NUMA pools10.1.0.4 & 10.2.0.1
Bug 3202031 -PTru64 OERI [KSMHEAP_ALLOC1] on STARTUP with NUMA_ON10.2.0.1
Bug 6730567ORA-4031 / OERI [17137] with NUMA optimization enabled11.1.0.7
Bug 6689903Errors such as ORA-27504 using RAC on NUMA10.2.0.5
Bug:7232946ORA-600[KSKRECONFIGNUMA2] CAUSES INSTANCE CRASH 
Bug:6086099ORA-7445[KSBASEND_INTERNAL] OR ORA-7445[KCBZWW] DURING STARTUP10.2.0.3
Bug:7346564ORA-00600 [17112] LEADING ALL 3 NODES CRASH10.2.0.5 & 11.1.0.6
Bug:8244734NUMA POOL UNDERCONFIGURED DUE TO KSS TRUNCATION10.2.0.5
Bug:8856696INSTANCES HANG WITH 'FREE BUFFER WAITS' OR HIT ORA-0037911.1.0.7.1 & 11.2.0.2


  • P ==> Port specific bug
  • Bugs with empty fixed release means that the bug is either being currently fixed or the bug has been closed due to customer using the workaround (_enable_numa_optimization=FALSE and _db_block_numa=1)



Community Discussions

Still have questions? Use the communities window below to search for similar discussions or start a new discussion on this subject.

Note: Window is the LIVE community not a screenshot.

Click here to open in main browser window.

 

HISTORY

Publication date 22-DEC-2008
Updated with some other bugs 17-Feb-2009
Updated the note to change the underscore parameter to regular init.ora parameter 14-Apr-2009
18-Jun-2009, Implemented changes recommended by Ravi

REFERENCES

BUG:8244734 - NUMA POOL UNDERCONFIGURED DUE TO KSS TRUNCATION

NOTE:736173.1 - NUMA FAQ
NOTE:761065.1 - Oracle Database ccNUMA support and dynamic partitioning on HP-UX

BUG:5173642 - DATA NOT READ FROM CACHE ON SECOND EXECUTION ON WIN2003 ON AMD64

BUG:6086099 - ORA-7445[KSBASEND_INTERNAL] OR ORA-7445[KCBZWW] DURING STARTUP
BUG:6868080 - NUMA POOLS GENERATED ORA-4031



BUG:7346564 - ORA-00600 [17112] LEADING ALL 3 NODES CRASH
BUG:7357301 - DB HANG WHEN START 1 MORE INSTANCE DUE TO NUMA
BUG:7232946 - ORA-600[KSKRECONFIGNUMA2] CAUSES INSTANCE CRASH


---------------------------------------------------

To BottomTo Bottom

In this Document

Details
 Recommendation:
Actions
 Steps to enable NUMA with Oracle Server Version 11.2:
  Caution
Contacts
References


APPLIES TO:

Oracle Database Cloud Schema Service - Version N/A and later
Oracle Database Exadata Cloud Machine - Version N/A and later
Oracle Cloud Infrastructure - Database Service - Version N/A and later
Oracle Database Cloud Exadata Service - Version N/A and later
Oracle Database Backup Service - Version N/A and later
Information in this document applies to any platform.
Oracle on NUMA capable hardware

DETAILS

Oracle NUMA (Non Uniform Memory Architecture) support can be used with large SMP multiprocessor environments with NUMA hardware. When enabled Oracle NUMA support facilitates efficient use of underlying NUMA hardware and may improve database performance.

Oracle NUMA support needs the right combination of hardware, operating system and Oracle version.

Starting with 11.2.0.1, Oracle NUMA support is disabled by default.

This note covers how Oracle NUMA support can be enabled via configuration for Oracle Server Version 11.2.

For more information about Oracle NUMA usage recommendation and Oracle NUMA support in previous releases please consult Note:759565.1

Care should be taken before enabling NUMA support. Contact your hardware vendor for recommendation and information on your system and operating system NUMA capabilities

Recommendation:

  • Customers who have tuned their Database specifically for NUMA can continue to run with NUMA enabled with Oracle Server Version 11.2.0.1. 
  • It is strongly recommended to customers who want to enable NUMA do sufficient testing before going into production.

To enable NUMA support with Oracle Server Version 11gR2 consult the following section "Steps to enable NUMA with Oracle Server Version 11.2”. To disable NUMA on previous versions review Note 759565.1

ACTIONS

Steps to enable NUMA with Oracle Server Version 11.2:

When running an Oracle database Version 11.2 in a NUMA capable environment, Oracle will not by default detect if the hardware and operating system are NUMA capable and enable Oracle NUMA support.

To enable this capability, the following underscore init.ora parameter need to be set.

_enable_NUMA_support=TRUE

 No additional steps are required.

This parameter replace and deprecate the init.ora parameter _enable_NUMA_optimization.
If _enable_NUMA_optimization is used instead of _enable_NUMA_support a warning will be displayed in the alert log:

..._enable_NUMA_optimization is deprecated please use _enable_NUMA_support instead....

 Once the init.ora parameter _enable_NUMA_support is set to TRUE and if the Oracle database version 11.2 runs in a NUMA capable environment, the alert log of the database instance should reflect that NUMA support has been enabled and what NUMA configuration was detected. For example, on a 8 NUMA domains Linux system with 48 cores:

...NUMA system found and support enabled (8 domains - 6,6,6,6,6,6,6,6)...

 Caution

  • Disabling or enabling NUMA can change application performance.
  • It is strongly recommended to evaluate the performance before and after enabling NUMA in a test environment before going into production.
  • Operating system and/or hardware configuration may need to be tuned or reconfigured when disabling Oracle NUMA support. Consult your hardware vendor for more information or recommendation

--------------------------------------------------

To BottomTo Bottom

In this Document

Symptoms
Cause
Solution
References


APPLIES TO:

Linux OS - Version Oracle Linux 7.4 and later
Linux x86-64

SYMPTOMS

RHCK kernel with default Automatic NUMA Balancing induces high IO wait times due to hints page fault during page migration.
For NUMA hardware, the access speed to main memory is determined by the location of the memory relative to the CPU. The performance of a workload depends on the application threads accessing data that is local to the CPU the thread is executing on. Automatic NUMA Balancing migrates data on demand to memory nodes that are local to the CPU accessing that data.

Command "cat /proc/sys/kernel/numa_balancing" will show to be 1.

The NUMA balancing is achieved from the following steps

1. A task scanner periodically scans a portion of a task's address space and marks the memory to force a page fault when the data is next accessed.

2. The next access to the data will result in a NUMA Hinting Fault. Based on this fault, the data can be migrated to a memory node associated with the task accessing the memory.

3. To keep a task, the CPU it is using and the memory it is accessing together, the scheduler groups tasks that share data.

It is due to this induced page fault for page migration that causes high %iowait.

Whether this overhead can be compensated by tasks accessing local node memory depends.

Performance data shows blocking of tasks with high %iowait but no high disk I/Os access issues due to system reads and writes.

vmstat shows blocking on wa

zzz ***Mon Dec 28 11:25:22 CST 2020
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
24 426 0 502696000 506280 296023456 0 0 208 48 0 0 2 1 97 0 0
36 543 0 502876320 506280 296249664 0 0 10923 591 161479 128670 9 7 19 64 0
14 1023 0 502625120 506280 296354304 0 0 6363 490 167416 111990 7 7 8 78 0
zzz ***Mon Dec 28 11:26:33 CST 2020
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
25 133 0 563774592 506280 294375328 0 0 208 48 0 0 2 1 97 0 0
50 115 0 563888192 506280 294376992 0 0 38491 3054 190037 169273 12 8 72 8 0
40 5 0 561866688 506280 294369888 0 0 46107 1024 196046 181620 13 7 72 8 0

ps shows D-state tasks

poracle 406661 1 19 1.4 1.0 87820580 11342140 wait_o D 05:00:03 00:05:34 oraclecums2 (LOCAL=NO)
poracle 379722 1 19 4.9 1.0 87814416 10938380 wait_o D 08:56:48 00:07:24 oraclecums2 (LOCAL=NO)
poracle 375667 1 19 2.4 1.0 87817524 11148184 wait_o D 08:55:43 00:03:44 oraclecums2 (LOCAL=NO)
poracle 374255 1 19 1.3 1.0 87830900 11263564 wait_o D 08:55:00 00:02:03 oraclecums2 (LOCAL=NO)

mpstat shows high %iowait

Linux 3.10.0-693.11.6.el7.x86_64 (hostname) 12/28/2020 _x86_64_ (176 CPU)

11:35:20 AM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
11:35:21 AM all 9.23 0.00 7.35 77.18 0.00 0.77 0.00 0.00 0.00 5.47
11:35:21 AM 0 14.14 0.00 41.41 43.43 0.00 1.01 0.00 0.00 0.00 0.00
11:35:21 AM 1 4.55 0.00 2.27 93.18 0.00 0.00 0.00 0.00 0.00 0.00
11:35:21 AM 2 5.32 0.00 3.19 91.49 0.00 0.00 0.00 0.00 0.00 0.00
11:35:21 AM 3 5.15 0.00 5.15 89.69 0.00 0.00 0.00 0.00 0.00 0.00
11:35:21 AM 4 33.00 0.00 27.00 22.00 0.00 18.00 0.00 0.00 0.00 0.00
11:35:21 AM 5 4.08 0.00 5.10 88.78 0.00 2.04 0.00 0.00 0.00 0.00
11:35:21 AM 6 6.19 0.00 7.22 85.57 0.00 1.03 0.00 0.00 0.00 0.00
11:35:21 AM 7 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
11:35:21 AM 8 39.60 0.00 12.87 32.67 0.00 14.85 0.00 0.00 0.00 0.00
11:35:21 AM 9 4.08 0.00 7.14 83.67 0.00 0.00 0.00 0.00 0.00 5.10
11:35:21 AM 10 6.19 0.00 4.12 89.69 0.00 0.00 0.00 0.00 0.00 0.00
11:35:21 AM 11 4.08 0.00 2.04 93.88 0.00 0.00 0.00 0.00 0.00 0.00
11:35:21 AM 12 36.00 0.00 11.00 50.00 0.00 3.00 0.00 0.00 0.00 0.00
11:35:21 AM 13 4.08 0.00 16.33 79.59 0.00 0.00 0.00 0.00 0.00 0.00
11:35:21 AM 14 10.42 0.00 4.17 85.42 0.00 0.00 0.00 0.00 0.00 0.00
11:35:21 AM 15 5.21 0.00 4.17 89.58 0.00 1.04 0.00 0.00 0.00 0.00
11:35:21 AM 16 32.35 0.00 11.76 53.92 0.00 1.96 0.00 0.00 0.00 0.00
11:35:21 AM 17 5.05 0.00 14.14 80.81 0.00 0.00 0.00 0.00 0.00 0.00
11:35:21 AM 18 7.14 0.00 8.16 84.69 0.00 0.00 0.00 0.00 0.00 0.00
11:35:21 AM 19 5.15 0.00 6.19 88.66 0.00 0.00 0.00 0.00 0.00 0.00
11:35:21 AM 20 29.29 0.00 17.17 52.53 0.00 1.01 0.00 0.00 0.00 0.00

 NUMA statistics can be found from /proc/vmstat

numa_pte_updates
The amount of base pages that were marked for NUMA hinting faults.

numa_huge_pte_updates
The amount of transparent huge pages that were marked for NUMA hinting faults. In combination with numa_pte_updates the total address space that was marked can be calculated.

numa_hint_faults
Records how many NUMA hinting faults were trapped.

numa_hint_faults_local
Shows how many of the hinting faults were to local nodes. In combination with numa_hint_faults, the percentage of local versus remote faults can be calculated. A high percentage of local hinting faults indicates that the workload is closer to being converged.

numa_pages_migrated
Records how many pages were migrated because they were misplaced. As migration is a copying operation, it contributes the largest part of the overhead created by NUMA balancing.

By the way, Oracle UEK4 kernel has numa_balancing turning off starting from kernel version 4.1.12-124.20.5

Bug 28814880 - Enabling numa balancing causes high I/O wait on numa systems

CAUSE

numa_balancing is default on for Oracle Linxu 7 RHCK kernel which indices high %iowait due to hints page fault for page migration.

SOLUTION

Turn off numa_balancing

echo 0 > /proc/sys/kernel/numa_balancing



「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论