Building Analysis System

I'm working on building an affordable desktop or desktop / small server analysis system. I want to build a purpose driven hardware / OS design for a high speed / high volume analytic system design. I've been using Intel processors, motherboards (vanishing species), SSDs, etc. with hierarchical design variations for processing storage:

Fast HDDs in RAID 10 / 1E -> fast SSDs in RAID 0 -> DDR3 SDRAM -> maximum processor cache within budget.

This stage is fed by a "need for speed" optimized standard database: Oracle, MS SQL Server, MySQL, in-memory database. I have a "never back to disk" rule where no intermediate storage needs use writing to an HDD, only DRAM or SSD. I have found some cases sticking to this rule is not optimal.

I program in C/C++, Python, Mathematica, R and CUDA. I strive for optimization at the lowest level including determining the processor design and coding for that processor. I am trying to determine the cost / benefit for developing in that way for the processor, including building optimal code blocks for some sections and branching at those points. There is overhead in switching back and forth and that must be considered in the cost / benefit calculation. I've been using CUDA with Kepler GPUs, but that coding is more costly at this time. One must use it wisely. Mathematica makes use of CUDA at a high level for certain operations and functions.

I cannot afford a Xeon Phi at this time and am looking at it's in theory.

Any ideas or comments?