Configuring Memory for Oracle on Linux

In recent blog posts I have talked about optimizations of Oracle on Linux for both CPU and I/O with SSDs, in this post I will complete the picture and focus on configuring memory.

In my post how to Maximise CPU Performance for the Oracle Database on Linux I stated that the best place to start from CPU information is  Not surprisingly there is a similar tool to help you configure memory here and this can help if you are currently sizing a system or wish to find out the capabilities of the system you already have.  To find out about your hardware as described by your BIOS you can use the command dmidecode . This well tell you information such as processors, BIOS version and system board as well as the memory that is currently populated, for example:

# dmidecode 2.11

SMBIOS 2.6 present.

Memory Device

      Size: 8192 MB

      Form Factor: DIMM

      Set: None

      Locator: DIMM_A1

      Bank Locator: NODE 0 CHANNEL 0 DIMM 0

      Type: DDR3

      Type Detail: Synchronous

      Speed: 1600 MHz

Referring to the dmidecode output and memory configuration tool and cross referring the capabilities of my processor and system board I can see that my balanced configuration of 128GB RDIMM 1600 Dual Rank (16 x 8GB) 1.5v can run at 1600 MT/s which is  PC3-12800 or a peak data rate of the module of 12.8GB/sec.  As the memory controller is on the CPU I can also reference the Memory Specifications sections from to see that with the E5-2680 there is a maximum of 4 memory channels per socket making it clear from the memory specifications why the Max Memory Bandwidth per socket is given as 51.2GB/s (You can run the STREAM benchmark to test your own system). also tells us that in a 2 socket configuration with a system board with the maximum number of slots the maximum memory configuration is 768GB. Rather than having to manually size the memory configuration tool gives you the option to determine both the correct memory and possible levels of capacity and bandwidth based on the processor and system board information you have given.  Also checking on for the E7 processor shows a current memory capacity of 2TB for a 4 socket configuration and 4TB for 8 sockets however there are no entries for E7 systems on In contrast to the four channel integrated memory controller on the E5 processor the E7 features two on-die memory controllers managing Scalable Memory Interconnect (SMI) channels connecting to Scalable Memory Buffers enabling up to 16 DIMMs per socket.  This architecture enables the maximum capacity to run at the maximum rated frequency which has told us is 1066 MHz. Consequently for E7 the memory configuration tool is not required.

I’ve mentioned integrated memory controllers and memory capacity being dependent on the number of sockets and this indicates that Xeon systems have a NUMA configuration meaning that memory access times or latency is lower for a CPU accessing its local memory as opposed to the memory controlled by the remote CPU as shown.  LMbench is an example of tool that can be used to measure the different access times.


The CPUs communicate via the QPI (Quick Path Interconnect) link (on E5 2 socket there are 2 connecting the CPUs) and it is important to distinguish between the QPI links and the memory channels as shown above which can often a source of confusion.  Given the NUMA configuration of the platform this means that anyone running Oracle on Linux on Xeon needs to have some awareness of NUMA regarding their Linux and Oracle configurations.

Taking the Linux configuration first in my post on configuring redo for ASM I mentioned running the Oracle Validated/Pre-Install RPM with yum to configure the operating system.  One of the settings that this makes is to add the boot parameter “numa=off”. However as we have already seen the hardware itself is NUMA so how is it possible to turn it off?  Similarly as well as this option at the Linux level there may also be an option in the BIOS to turn NUMA off.  Whether at the BIOS or operating system at different levels what this option does is to interleave memory between the memory controllers on the system. The system gives the impression of having a single memory node however memory accesses are evenly distributed across all of the controllers in the system without awareness being required at the application level.  The reason for setting this parameter is a reflection of the fact that the difference between the latency of local and remote memory access measured in nanoseconds means that in on 2 and 4 socket systems the differences between the gains of enabling (and configuring) or disabling NUMA awareness are typically within 1% even when measured in a high performance environment. However going to 8 sockets and beyond may see gains of approximately 5% and enabling and correctly configuring NUMA awareness on all systems typically delivers the best performance so it is well worth knowing what this involves even if you decide to remain with the default configuration.

Having decided to enable NUMA, edited grub.conf removed “numa=off” and rebooted I can see that my 2 socket system is now running in NUMA mode:

[root@sandep1 ~]# dmesg | grep -i numa

NUMA: Initialized distance table, cnt=2

NUMA: Node 0 [0,c0000000) + [100000000,1040000000) -> [0,1040000000)

pci_bus 0000:00: on NUMA node 0 (pxm 0)

pci_bus 0000:80: on NUMA node 1 (pxm 1)

Using the numactl command I can also see how the memory is allocated now between 2 memory nodes and the CPUs associated with these nodes.

[root@sandep1 ~]# numactl --hardware

available: 2 nodes (0-1)

node 0 cpus: 0 1 2 3 4 5 6 7 16 17 18 19 20 21 22 23

node 0 size: 65459 MB

node 0 free: 8176 MB

node 1 cpus: 8 9 10 11 12 13 14 15 24 25 26 27 28 29 30 31

node 1 size: 65536 MB

node 1 free: 8447 MB

node distances:

node   0   1

  0: 10  21

  1: 21  10

We can now proceed to configure Oracle for NUMA however firstly it is worth looking into a feature called Huge Pages. The default memory page sized is determined by the CPU architecture and on x86 systems this page size is 4KB.  On Linux the page size is defined by  1 << PAGE_SHIFT and as PAGE_SHIFT by default  is 12,  212 gives us the expected 4KB.  Within the virtual memory addressing these lowest 12 bits of the address are used as an offset into this 4KB page.  If we change this offset to the next boundary of 21 bits used as the index into the page table for 4KB pages we can enable Huge Pages which are now 221 = 2MB. (1GB Huge Pages are also supported by current CPUs by further moving the offset). The main benefits of Huge Pages are firstly that the Translation Lookaside Buffer (TLB) that caches virtual to physical address mappings can cache the addressing of more of the Oracle SGA with larger pages and secondly the amount of memory used for page tables is reduced. Within Linux with standard pages a separate page table is maintained for each process. With a large Oracle SGA and a large number of processes this amount of memory used for page tables can be large and therefore using Huge Pages significantly reduces this overhead.  Huge Pages can be allocated with the parameter vm.nr_hugepages in /etc/sysctl.conf. For example to allocated 55000 2MB pages the following would be used and allocated at boot or by running sysctl –p to apply the changes.


The allocated 55000 2MB pages are pinned in memory awaiting use by Oracle and can be viewed in /proc/meminfo.

HugePages_Total: 55000

HugePages_Free: 55000

HugePages_Rsvd:        0

HugePages_Surp:        0

Hugepagesize: 2048 kB

DirectMap4k: 6144 kB

DirectMap2M: 2058240 kB

DirectMap1G: 132120576 kB

The importance of Huge Pages in the context of this discussion is that the Huge Page allocation is also and separately NUMA aware.  Huge Pages are evenly distributed between the available memory nodes and as a consequence if NUMA is enabled at the operating system and Huge Pages are used then even without additional Oracle NUMA parameters the Oracle SGA will be evenly distributed between the memory nodes.  This can be thought of as being similar to setting the boot parameter numa=off but instead taking effect at a higher granularity.

When using Huge Pages you should set the Oracle parameter use_large_pages = ONLY. This ensures that the instance will start only if sufficient Huge Pages are available and report information on usage to the alert log.

************************ Large Pages Information *******************

Parameter use_large_pages = ONLY

Per process system memlock (soft) limit = 141 GB

Total System Global Area in large pages = 88 GB (100%)

Large pages used by this instance: 44844 (88 GB)

Large pages unused system wide = 10156 (20 GB)

Large pages configured system wide = 55000 (107 GB)

Large page size = 2048 KB


It can be seen that the Oracle reported Huge Pages configuration corresponds to the information available from Linux.

[root@sandep1 ~]# cat /proc/meminfo | grep -i huge

AnonHugePages:    108544 kB

HugePages_Total:   55000

HugePages_Free:    10265

HugePages_Rsvd:      109

HugePages_Surp:        0

Hugepagesize:       2048 kB

The Huge Pages are evenly distributed so by looking just at node 0 we can see exactly half of the requested huge pages have been allocated from here.

[root@sandep1 node0]# pwd


[root@sandep1 node0]# cat meminfo

Node 0 AnonHugePages:     16384 kB

Node 0 HugePages_Total: 27500

Node 0 HugePages_Free:   5243

Node 0 HugePages_Surp:      0

After this discussion of Huge Pages you may be aware of a Linux feature called Transparent Huge Pages and be thinking that all of the benefits described previously will happen automatically by leaving this feature enabled. However at the time of writing Transparent Huge Pages do not support shared memory or in other words do not support the memory required for the Oracle SGA, for this reason for an Oracle on Linux system you should set the boot parameter transparent_hugepages=never to disable this feature and if using Huge Pages continue to configure them manually.

Whether using Huge or standard pages you can also enable NUMA awareness with Oracle by setting the parameter _enable_NUMA_support = TRUE.  When enabled this can be viewed in the Oracle alert log

NUMA node details:

  OS NUMA node 0 (16 cpus) mapped to Oracle NUMA node 0

  OS NUMA node 1 (16 cpus) mapped to Oracle NUMA node 1

and has the impact of binding core processes to NUMA nodes

DBW0 started with pid=12, OS id=3902 , bound to OS numa node 0

Starting background process DBW1

DBW1 started with pid=11, OS id=3904 , bound to OS numa node 1

Starting background process DBW2

DBW2 started with pid=14, OS id=3906 , bound to OS numa node 0

Starting background process DBW3

DBW3 started with pid=13, OS id=3908 , bound to OS numa node 1

Starting background process LGWR

LGWR started with pid=15, OS id=3910 , bound to OS numa node 1

allocating the SGA with locality considerations

Area #2 `NUMA pool 0' containing Subareas 3-3

  Total size 0000000a10000000 Minimum Subarea size 10000000

  Owned by: 0

   Area Subarea    Shmid      Stable Addr      Actual Addr

      2 3 25100291 0x00000080000000 0x00000080000000

               Subarea size     Segment size  

                          0000000a10000000 0000000a10000000

Area #3 `NUMA pool 1' containing Subareas 4-4

  Total size 0000000a10000000 Minimum Subarea size 10000000

  Owned by: 1

   Area Subarea    Shmid      Stable Addr      Actual Addr

      3 4 25133060 0x00000a90000000 0x00000a90000000

               Subarea size     Segment size  

as well as binding listeners to processors.

Setting  _enable_NUMA_support on a NUMA enabled system means Oracle will operate on a NUMA aware basis and as systems increase in size and socket counts deliver efficiencies that increase performance. 

Once the system is configured you may also want to monitor the memory  performance to ensure that everything has been configured correctly. To do so I would recommend the Intel Performance Counter Monitor , one of the tools included within this package enables you to observer your memory performance in a similar way to iostat is used to monitor I/O.


In the example shown above it can be seen that taking the approach described in this post and evenly distributing the Oracle SGA across the memory nodes in a system you can achieve an optimal and balanced Oracle configuration.