IDF2010: Memory error handling in SAP* in-memory database on Intel® Xeon® 7500 processor series

Modern servers can now support up to terabytes of main memory and a failure of even a single memory cell can lead to a crash. Besides soft errors that can be corrected in hardware, hard uncorrectable errors can occur; in such case the only option for a server was to stop operation. In view of this recent report memory errors might cause downtimes and recovery, which is unacceptable for mission-critical enterprise systems. To address this issue Intel has introduced a wide range of reliability and high-availability features in the Intel® Xeon® 7500 processor series (code-named Nehalem-EX).

These features are supported in Linux as Andi Kleen explains in his presentation: It is now possible that hard memory failures are caught by the operating system and exposed to applications. This way a server application can handle memory errors and continue to operate if running on Intel® Xeon® 7500 processor series.

SAP has announced at Sapphire that they are working towards revolutionizing their enterprise software by taking advantage of their in-memory technology, which will allow fast queries and real-time processing. Instead of waiting hours to compile reports or days to replicate data in business warehouses, business users will get immediate responses on real-time data. Naturally, for in-memory processing, it is very important to be resistant against memory errors. Please check out our SSG booth at Intel Developer Forum 2010 in person and my colleague Otto Bruggeman will be happy to show you how SAP’s in-memory database is handling memory errors on Intel® Xeon® 7500 processor series.

Best regards,