Posts Tagged ‘cluster’

Kernel recompilation & HPC application performance

Friday, October 30th, 2009

Some questions never die, kernel recompilation for improving the performance of an application is one of them. I have heard this question from users from various domains (CFD, seismic, financial, oil & gas, academic, bio-molecular modeling and so on and so on). It always starts the same way.

“I think I should recompile the kernel of my cluster so I can have better performance. What do you think?”

And my answer always is “No”. It does sound logical … you compile your code with the best possible optimizations and you get better performance (in most cases, I should add). Why does it not apply to the kernel? After all, kernel is what is managing my processes, running my system. It’s easy to start the debate this way but miss out a key aspect.

Here are a few key questions to ask before you start on this (almost always) fruitless exercise:

  • How much time do I actually spend in the kernel when you are running your (scientific) code?
  • How much of that time is actually spent doing something useful than waiting on something else (good old friend, disk I/O, interrupt handling)?

With newer interconnects like Infiniband which use user level drivers and employ kernel bypass to drastically improve latencies (barring the initial setup time), how much performance improvement can you really expect from recompiling your kernel?

Kernel recompilation can also bring cluster management headaches:

  • Deploy the new kernel to every node in the cluster
  • Recompile your kernel every time a new security or performance related patch is released
  • Recompile your hardware drivers to match your new kernel
  • Stability and performance issues of drivers with your choice of compiler optimizations
  • Not knowing what areas of the kernel code are adversely affected by your choice of optimizations
  • And not to forget, some ISV’s support their code on only certain kernels only. Once you start using your ISV code on a different kernel, goodbye vendor support!

A more practical approach would be to look in to the application code and make optimizations in its code either through good old hand tuning or through performance libraries or straight forward compiler optimizations. Beware if you are dealing with floating point and double precision arithmetic, you should tread carefully when using more aggressive compiler optimizations. Several compilers do not guarantee precision at higher optimizations.

Using simple techniques like data decomposition, functional decomposition, overlapping computation & communication and pipelining to improve the efficiency of your available compute resources. This will yield a better return on investment especially when we are moving in to an increasingly many-core environment.

There is a paper on how profile-based optimization of the kernel yielded a significant performance improvement. More on that here.

And results from a recent article on Gentoo essentially show that for most applications and usage cases, it does not make much sense to compile and build your own kernel.