Compiler Optimizations for AMD Opteron Quadcore Barcelona with PGI compilers

As a part of our benchmarking exercise of A5808-32, we started using PGI compilers. After a number of experiments, the following compiler switches seem to give the best performance. Unless specified, the flags must be provided during compilation and linking phase.

-fast: The usual macro for starters. -fast implies -fastsse on 64-bit platforms

-fastsse: Enable SIMD operations.

-Mipa: Enable Interprocedural optimizations. Use as: -Mipa=fast,inline – IPA and automaitc procedure inlining. This enables a two pass compilation and linking.

-Mpfi & -Mpfo: Enable profile guided optimization. -Mpfi enables instrumentation. -Mpfo uses the data collected to guide the optimization.

-Mvect=sse: Enable vectorization of code using SSE

-O<level>: 4 is the highest level of optimization with aggressive techniques

-tp=<target type>: Optimize code for the target processor. Top choices: barcelona, barcelona-64, amd64, amd64e, core2, core2-64

-Munroll: Enable loop unrolling

-Mconcur: autoparallelize loops

-Minline: Inline functions automatically. One can also provide the name of the function to inline.

-mp: Enable recognizing OpenMP directives

-Mloop32: Align innermost loops on 32 byte boundary on Barcelona processors. Small loops run faster with this flag on Barcelona.

One Response to “Compiler Optimizations for AMD Opteron Quadcore Barcelona with PGI compilers”

Leave a Reply

You must be logged in to post a comment.