memory bandwidth on AMD Opteron 2354

We got our hands on a new mainboard supporting the split plane (Dual Dynamic Power Management) feature of AMD Opteron quad core (Barcelona) processors. The earlier mainboards do support Barcelona fully but not the split plane feature. Due to this, the memory controller on the Barcelona and the L2 cache run at a slower clock than on a split plane board. Slower clock rate implies lower memory bandwidth and incerased latency compared to the same processor on a split place board.

 Well, this could a great opportunity to test what improvements does the split plane offers in terms of memory performance.

 The test system is setup as follows:

HPC Systems, Inc. A1204

Dual AMD Opteron 2354

8 X 1 GB DDR2 667 MHz

SLES 10 SP1

Western Digital 250 GB SATA hard drive

SUN Studio 12

STREAM benchmark

Problem size: N = 20000000

Compiler command used

suncc -fast -xO4 -xprefetch -xprefetch_level=3 -xvector=simd -xarch=sse3 -xdepend  -m64 -xopenmp -o stream.big ../stream.c  -xlic_lib=sunperf -I../

Performance for 1 thread (compiled without -xopenmp flag):

————————————————————-
Function      Rate (MB/s)   Avg time     Min time     Max time
Copy:        5724.3136       0.0559       0.0559       0.0559
Scale:       6077.3024       0.0527       0.0527       0.0527
Add:         5692.4606       0.0843       0.0843       0.0844
Triad:       5696.1831       0.0843       0.0843       0.0843
————————————————————-
Solution Validates
————————————————————-

We did see a higher bandwidth number with PGI compilers … close to 6.5 GB/s but we are unable to post the result becasue the license has expired for the binaries compiled with PGI compilers.

Performance for 4 threads:

————————————————————-
Function      Rate (MB/s)   Avg time     Min time     Max time
Copy:       12230.5392       0.0262       0.0262       0.0262
Scale:      12099.2614       0.0265       0.0264       0.0265
Add:        11536.8169       0.0417       0.0416       0.0417
Triad:      11543.9895       0.0417       0.0416       0.0418
————————————————————-
Solution Validates
————————————————————-

Performance for 8 threads:

————————————————————-
Function      Rate (MB/s)   Avg time     Min time     Max time
Copy:       17516.0718       0.0183       0.0183       0.0183
Scale:      17382.8602       0.0184       0.0184       0.0185
Add:        16455.8826       0.0292       0.0292       0.0293
Triad:      16519.7865       0.0291       0.0291       0.0291
————————————————————-
Solution Validates
————————————————————-

 From the numbers, we seem to have hit the same performance as advertised on AMD web site.

The peak bandwidth of a 2P AMD Opteron system is 21.2 GB/s. We achieved a sustained of 17.5 GB/s i.e a sustained value of 82%

Here are the results with only one socket populated. This exercise is important to eliminate the issues of how the memory is allocated across sockets and also the issue of threads scheduled on different sockets.

Performance for 1 threads (compiled without -xopenmp flag) :

————————————————————-
Function      Rate (MB/s)   Avg time     Min time     Max time
Copy:        6256.7322       0.0528       0.0527       0.0528
Scale:       6417.2126       0.0499       0.0499       0.0499
Add:         6306.9054       0.0761       0.0761       0.0762
Triad:       6333.5465       0.0758       0.0758       0.0758
————————————————————-
Solution Validates
————————————————————-

Performance for 4 threads :

————————————————————-
Function      Rate (MB/s)   Avg time     Min time     Max time
Copy:        9148.0695       0.0350       0.0350       0.0351
Scale:       9080.6064       0.0353       0.0352       0.0353
Add:         8510.1783       0.0565       0.0564       0.0565
Triad:       8511.8559       0.0564       0.0564       0.0565
————————————————————-
Solution Validates
————————————————————-

That is about 9.1 GB/s sustained from a peak of 10.1 Gb/s, i.e 90% efficiency

One Response to “memory bandwidth on AMD Opteron 2354”

  1. [...] Technology for Everyday Needs Making sense of complex new technologies « memory bandwidth on AMD Opteron 2354 [...]

Leave a Reply

You must be logged in to post a comment.