Intel Chip Chat – The Intel Xeon Scalable Platform and Intel Select Solutions

Lisa Spelman, Vice President and General Manager of Intel® Xeon® Products and Data Center Marketing at Intel, joins Intel® Chip Chat to discuss the launch of two pivotal advances for the data center: the delivery of Intel® Xeon® Scalable processors and announcement of Intel® Select Solutions.

Spelman heralds the Intel Xeon Scalable platform, the biggest advancement of the Intel Xeon processor family in a decade. The Intel Xeon Scalable processors offer up to 28 cores, significantly increased per-core performance, more memory channels and PCI Express* lanes for greater flexibility and capacity, support for the Intel® Advanced Vector Extensions 512 instruction set to accelerate HPC, analytics, and AI workloads, and additional security, storage, and interconnect integrations for greater performance and agility. These numerous advances are the result of Intel's platform-centric view and will enable customers to combine Intel Xeon Scalable processors with other Intel technologies, such as Intel® Optane™ SSDs, to deliver higher performance for new and emerging workloads.

Spelman also introduces Intel Select Solutions, verified hardware and software configurations for the data center that reduce evaluation time and speed deployment. Intel Select Solutions will help customers get up and running more quickly and confidently, achieving faster time-to-money and delivering on their TCO.

For more information on Intel Select Solutions, please visit https://intel.com/selectsolutions. For more information on the Intel Xeon Scalable platform, please check out the Intel Xeon Scalable platform launch at https://LaunchEvent.intel.com, learn more at https://intel.com/xeonscalable, and look for #XeonScalable on Twitter.

Performance Substantiations

IPC gains of 10%: IPC gains are based on detailed models of the core created internally by Intel. The models simulations utilize over a thousand instruction sequences which are believed to approximate the expected behavior of a wide range of customer applications. They are estimates and for informational purposes only on processor cores based on Broadwell and Skylake micro-architectures.

Up to 1.65x Geomean based on Normalized Generational Performance going from Intel® Xeon® processor E5-26xx v4 to Intel® Xeon® Scalable processor (estimated based on Intel internal testing of  OLTP Brokerage, SAP SD 2-Tier, HammerDB, Server-side Java, SPEC*int_rate_base2006, SPEC*fp_rate_base2006, Server Virtualization, STREAM* triad, LAMMPS, DPDK L3 Packet Forwarding, Black-Scholes, Intel Distribution for LINPACK Average.

  • Up to 1.3x claim based on Brokerage Firm OLTP: 1-Node, 2 x Intel® Xeon® Processor E5-2699 v4 on Grantley-EP (Wellsburg) with 512 GB Total Memory on Windows Server* 2012 R2 Standard using SQL Server 2014. Data Source: Request Number: 1640, Benchmark: Brokerage Firm OLTP, Score: 4373 transactions per second (tps) for OLTP vs. 1-Node, 2 x Intel® Xeon® Platinum 8180 Processor on Purley-EP (Lewisburg ) with 764 GB Total Memory on. Windows Server* 2016 RTM Standard using SQL Server 2016 Data, Score: 5979 tps for OLTP. Higher is better
  • Up to 1.4x claim based on 2-Tier SAP* SD : 1-Node, 2 x Intel® Xeon® Processor E5-2699 v4 on Grantley-EP (Wellsburg) with 512 GB Total Memory on SUSE Linux Enterprise Server* 10 SP4 using SAP EHP5.0 for ERP 6.0 and Sybase ASE 16.0. Data Source: Request Number: 2473, Benchmark: SAP* SD 2-Tier enhancement package 5 for SAP ERP 6.0, Score: 19721 vs. 1-Node, 2 x Intel® Xeon® Platinum 8180 Processor on Purley-EP (Lewisburg) with 768 GB Total Memory on SUSE Linux Enterprise Server* 12 using SAP ERP6.0/EHP5. Data Source: Request Number: 2558, Benchmark: SAP* SD 2-Tier enhancement package 5 for SAP ERP 6.0, Score: 27678 Higher is better
  • Up to 1.4x claim based on Server-side Java: 1-Node, 2 x Intel® Xeon® Processor E5-2699 v4 on Wildcat Pass with 128 GB Total Memory on Red Hat Enterprise Linux* 6.5 kernel 2.6.32-431 using Java 8 SE, JDK8U60, Java Hotspot V1.8.0_60 (if appropriate). Data Source: Request Number: 1633, Benchmark: Server-side Java workload - MultiJVM, Score: 112054 Higher is better, vs. 1-Node, 2 x Intel® Xeon® Platinum 8180 Processor on Purley-EP (Lewisburg) with 384 GB Total Memory on Red Hat Enterprise Linux* 7.3 using jdk1.8u121. Data Source: Request Number: 2513, Benchmark: Server-side Java workload - MultiJVM, Score: 167696 Higher is better
  • Up to 1.5x claim based on SPECint*_rate_base2006 : 1-Node, 2 x Intel® Xeon® Processor E5-2699 v4 on Grantley-EP (Wellsburg) with 256 GB Total Memory on Red Hat Enterprise Linux* 7.2-kernel 3.10.0-327 using Compiler: C/C++: Version 16.0.0.101 of Intel C++ Studio XE for Linux; - Fortran: Version 16.0.0.101 of Intel Fortran Studio XE for Linux. Data Source: Request Number: 2342, Benchmark: SPECint*_rate_base2006, Score: 1670 vs. 1-Node, 2 x Intel® Xeon® Platinum 8180 Processor on Neon City with 384 GB Total Memory on Red Hat Enterprise Linux* 7.2-kernel 3.10.0-327 using CPU2006_FOR-OEMs-cpu2006-1.2-ic17.0-lin-binaries-20160922. Data Source: Request Number: 2498, Benchmark: SPECint*_rate_base2006, Score: 2550 Higher is better
  • Up to 1.5x claim based on server virtualization workload: 1-Node, 2 x Intel® Xeon® Processor E5-2699 v4 on Grantley-EP (Wellsburg) with 512 GB Total Memory on VMware ESXi* 6.0 Update 1 using Guest VM's utilize RHEL 6 64bit OS. Data Source: Request Number: 1637, Benchmark: server virtualization workload, Score: 1034 @ 58 vs. 1-Node, 2 x Intel® Xeon® Platinum 8180 Processor on Wolf Pass SKX with 768 GB Total Memory on VMware ESXi6.0 U3 GA using Guest VM's utilize RHEL 6 64bit OS. Data Source: Request Number: 2563, Benchmark: server virtualization workload, Score: 1580 @ 90 VMs Higher is better
  • Up to 1.6x claim based on SPECfp*_rate_base2006 :1-Node, 2 x Intel® Xeon® Processor E5-2699 v4 on Grantley-EP (Wellsburg) with 256 GB Total Memory on Red Hat Enterprise Linux* 7.2-kernel 3.10.0-327 using Compiler: C/C++: Version 16.0.0.101 of Intel C++ Studio XE for Linux; - Fortran: Version 16.0.0.101 of Intel Fortran Studio XE for Linux. Data Source: Request Number: 2340, Benchmark: SPECfp*_rate_base2006, Score: 1050 Higher is better vs. 1-Node, 2 x Intel® Xeon® Platinum 8180 Processor on Neon City with 384 GB Total Memory on Red Hat Enterprise Linux* 7.2-kernel 3.10.0-327 using CPU2006_FOR-OEMs-cpu2006-1.2-ic17.0-lin-binaries-20160922. Data Source: Request Number: 2503, Benchmark: SPECfp*_rate_base2006, Score: 1720 Higher is better
  • Up to 1.65x claim based on STREAM - triad: 1-Node, 2 x Intel® Xeon® Processor E5-2699 v4 on Grantley-EP (Wellsburg) with 256 GB Total Memory on Red Hat Enterprise Linux* 6.5 kernel 2.6.32-431 using Stream NTW avx2 measurements. Data Source: Request Number: 1709, Benchmark: STREAM - Triad, Score: 127.7 Higher is better vs. 1-Node, 2 x Intel® Xeon® Platinum 8180 Processor on Neon City with 384 GB Total Memory on Red Hat Enterprise Linux* 7.2-kernel 3.10.0-327 using STREAM AVX 512 Binaries. Data Source: Request Number: 2500, Benchmark: STREAM - Triad, Score: 199 Higher is better
  • Up to 1.73x claim based on HammerDB:1-Node, 2 x Intel® Xeon® Processor E5-2699 v4 on Grantley-EP (Wellsburg) with 384 GB Total Memory on Red Hat Enterprise Linux* 7.1 kernel 3.10.0-229 using Oracle 12.1.0.2.0 (including database and grid) with 800 warehouses, HammerDB 2.18. Data Source: Request Number: 1645, Benchmark: HammerDB, Score: 4.13568e+006 Higher is better vs. 1-Node, 2 x Intel® Xeon® Platinum 8180 Processor on Purley-EP (Lewisburg) with 768 GB Total Memory on Oracle Linux* 7.2 using Oracle 12.1.0.2.0, HammerDB 2.18. Data Source: Request Number: 2510, Benchmark: HammerDB, Score: 7.18049e+006 Higher is better
  • Up to 1.73x claim based on LAMMPS: LAMMPS is a classical molecular dynamics code, and an acronym for Large-scale Atomic/Molecular Massively Parallel Simulator. It is used to simulate the movement of atoms to develop better therapeutics, improve alternative energy devices, develop new materials, and more. E5-2697 v4: 2S Intel® Xeon® processor E5-2697 v4, 2.3GHz, 36 cores, Intel® Turbo Boost Technology and Intel® Hyperthreading Technology on, BIOS 86B0271.R00, 8x16GB 2400MHz DDR4, Red Hat Enterprise Linux* 7.2 kernel 3.10.0-327. Gold 6148: 2S Intel® Xeon® Gold 6148 processor, 2.4GHz, 40 cores, Intel® Turbo Boost Technology and Intel® Hyperthreading Technology on, BIOS 86B.01.00.0412.R00, 12x16GB 2666MHz DDR4, Red Hat Enterprise Linux* 7.2 kernel 3.10.0-327.
  • Up to 1.77x claim based on DPDK L3 Packet Forwarding: E5-2658 v4: 5 x Intel® XL710-QDA2, DPDK 16.04. Benchmark: DPDK l3fwd sample application Score: 158 Gbits/s packet forwarding at 256B packet using cores. Gold 6152: Estimates based on Intel internal testing on Intel Xeon 6152 2.1 GHz, 2x Intel®, FM10420(RRC) Gen Dual Port 100GbE Ethernet controller (100Gbit/card) 2x Intel® XXV710 PCI Express Gen Dual Port 25GbE Ethernet controller (2x25G/card), DPDK 17.02. Score: 281 Gbits/s packet forwarding at 256B packet using cores, IO and memory on a single socket
  • Up to 1.87x claim based on Black-Scholes: which is a popular mathematical model used in finance for European option valuation. This is a double precision version. E5-2697 v4: 2S Intel® Xeon® processor CPU E5-2697 v4 , 2.3GHz, 36 cores, turbo and HT on, BIOS 86B0271.R00, 128GB total memory, 8 x16GB 2400 MHz DDR4 RDIMM, 1 x 1TB SATA, Red Hat Enterprise Linux* 7.2 kernel 3.10.0-327. Gold 6148: Intel® Xeon® Gold processor 6148@ 2.4GHz, H0QS, 40 cores 150W. QMS1, turbo and HT on, BIOS SE5C620.86B.01.00.0412.020920172159, 192GB total memory, 12 x 16 GB 2666 MHz DDR4 RDIMM, 1 x 800GB INTEL SSD SC2BA80, Red Hat Enterprise Linux* 7.2 kernel 3.10.0-327
  • Up to 2.27x claim based on LINPACK*: 1-Node, 2 x Intel® Xeon® Processor E5-2699 v4 on Grantley-EP (Wellsburg) with 64 GB Total Memory on Red Hat Enterprise Linux* 7.0 kernel 3.10.0-123 using MP_LINPACK 11.3.1 (Composer XE 2016 U1). Data Source: Request Number: 1636, Benchmark: Intel® Distribution of LINPACK, Score: 1446.4 Higher is better vs. 1-Node, 2 x Intel® Xeon® Platinum 8180 Processor on Wolf Pass SKX with 384 GB Total Memory on Red Hat Enterprise Linux* 7.3 using mp_linpack_2017.1.013. Data Source: Request Number: 3753, Benchmark: Intel® Distribution of LINPACK, Score: 3295.57 Higher is better

2.4x deep learning inference, 2.2x deep learning training, and 100x deep learning training/inference based on:

Platform: 2S Intel® Xeon® Platinum 8180 CPU @ 2.50GHz (28 cores), HT disabled, turbo disabled, scaling governor set to “performance” via intel_pstate driver, 384GB DDR4-2666 ECC RAM. CentOS Linux release 7.3.1611 (Core), Linux kernel 3.10.0-514.10.2.el7.x86_64. SSD: Intel® SSD DC S3700 Series (800GB, 2.5in SATA 6Gb/s, 25nm, MLC).

Performance measured with: Environment variables: KMP_AFFINITY='granularity=fine, compact‘, OMP_NUM_THREADS=56, CPU Freq set with cpupower frequency-set -d 2.5G -u 3.8G -g performance

Deep Learning Frameworks: Caffe: (http://github.com/intel/caffe/), revision f96b759f71b2281835f690af267158b82b150b5c. Inference measured with “caffe time --forward_only” command, training measured with “caffe time” command. For “ConvNet” topologies, dummy dataset was used. For other topologies, data was stored on local storage and cached in memory before training. Topology specs from https://github.com/intel/caffe/tree/master/models/intel_optimized_models (GoogLeNet, AlexNet, and ResNet-50), https://github.com/intel/caffe/tree/master/models/default_vgg_19 (VGG-19), and https://github.com/soumith/convnet-benchmarks/tree/master/caffe/imagenet_winners (ConvNet benchmarks; files were updated to use newer Caffe prototxt format but are functionally equivalent). Intel C++ compiler ver. 17.0.2 20170213, Intel MKL small libraries version 2018.0.20170425. Caffe run with “numactl -l“.

Platform: 2S Intel® Xeon® CPU E5-2697 v2 @ 2.70GHz (12 cores), HT enabled, turbo enabled, scaling governor set to “performance” via intel_pstate driver, 256GB DDR3-1600 ECC RAM. CentOS Linux release 7.3.1611 (Core), Linux kernel 3.10.0-514.21.1.el7.x86_64. SSD: Intel® SSD 520 Series 240GB, 2.5in SATA 6Gb/s, 25nm, MLC.

Performance measured with: Environment variables: KMP_AFFINITY='granularity=fine, compact,1,0‘, OMP_NUM_THREADS=24, CPU Freq set with cpupower frequency-set -d 2.7G -u 3.5G -g performance

Deep Learning Frameworks: Caffe: (http://github.com/intel/caffe/), revision b0ef3236528a2c7d2988f249d347d5fdae831236. Inference measured with “caffe time --forward_only” command, training measured with “caffe time” command. For “ConvNet” topologies, dummy dataset was used. For other topologies, data was stored on local storage and cached in memory before training. Topology specs from https://github.com/intel/caffe/tree/master/models/intel_optimized_models (GoogLeNet, AlexNet, and ResNet-50), https://github.com/intel/caffe/tree/master/models/default_vgg_19 (VGG-19), and https://github.com/soumith/convnet-benchmarks/tree/master/caffe/imagenet_winners (ConvNet benchmarks; files were updated to use newer Caffe prototxt format but are functionally equivalent). GCC 4.8.5, Intel MKL small libraries version 2017.0.2.20170110.

Up to 4.2x more VMs based on virtualization consolidation workload: Based on Intel® internal estimates 1-Node, 2 x Intel® Xeon® Processor E5-2690 on Romley-EP with 256 GB Total Memory on VMware ESXi* 6.0 GA using Guest OS RHEL6.4, glassfish3.1.2.2, postgresql9.2. Data Source: Request Number: 1718, Benchmark: server virtualization workload, Score: 377.6 @ 21 VMs Higher is better vs. 1-Node, 2 x Intel® Xeon® Platinum 8180 Processor on Wolf Pass SKX with 768 GB Total Memory on VMware ESXi6.0 U3 GA using Guest VM's utilize RHEL 6 64bit OS. Data Source: Request Number: 2563, Benchmark: server virtualization workload, Score: 1580 @ 90 VMs Higher is better

Up to 65% lower 4-year TCO estimate example based on equivalent rack performance using VMware ESXi* virtualized consolidation workload comparing 20 installed 2-socket servers with Intel Xeon processor E5-2690 (formerly “Sandy Bridge-EP”) running VMware ESXi* 6.0 GA using Guest OS RHEL6.4 compared at a total cost of $919,362 to 5 new Intel® Xeon® Platinum 8180 (Skylake) running VMware ESXi6.0 U3 GA using Guest OS RHEL 6 64bit at a total cost of $320,879 including basic acquisition.  Server pricing assumptions based on current OEM retail published pricing for 2-socket server with Intel Xeon processor E5-2690 v4 and 2 CPUs in 4–socket server using E7-8890 v4 – subject to change based on actual pricing of systems offered.