Python 3.0 Released

December 4th, 2008

Read more here.

Python is one of those things that I am always excited about. It may have been the underdog compared to Perl at some point in the past, but Python now is the ubiquitous  language that is working behind the scenes for almost any application.

Python can be seen working from web servers to simple desktop applications and everything else you can think of.

MPI+Python holds a bit of an interest for me.

Some active projects:

MPI for Python

 pyMPI

pyPar

HPC Systems at SuperComputing 08 (SC08)

December 4th, 2008

We were at booth number: 1726

On display was:

HiPerStation 8000 with 2X NVIDIA Tesla C1060

Here is a brief video of our exhibit at SC08. The demo shows couple of codes from the NVIDIA CUDA SDK and an instance of NAMD ported to CUDA .

Unable to update SUSE Linux 10

December 3rd, 2008

If you get this error message when trying to update a SUSE 10 based system using the Novell Customer Center Configuration menu

Execute curl command failed with '60':
curl: (60) SSL certificate problem, verify that the CA cert is OK. Details:
error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify
failed

the easiest & fastest way to fix it is to

Check the system time!

Yes, simple as that. Make sure your system time is correct and your update should proceed smoothly.

There goes almost a day wasted trying to figure this out.

As you can tell, I am not that good with SUSE, or am I? :)

RedHat/AMD trumps VMWare/Intel on live VM migration technology

December 3rd, 2008

In this YouTube video, RedHat & AMD demonstrate live migration of a virtual machine from Intel based server to an AMD based server.

Ever since Intel bought a stake in VMWare, VMWare products have noticeably and obviously featured enhancements for Intel processors. One of the coolest things to come out after the VMWare/Intel alliance was the capability to move virtual machines from one generation of Intel processors to another generation of Intel processors. This capability is markted as “VMWare Enhanced VMotion” and “Intel VT FlexMigration”. FlexMigration was a much needed feature given how incompatible one generation of Intel processors are from the other. Intel being one of the biggest investors, VMWare may be reluctant to provide enhancements to VMWare that work better with AMD’s products. A related post is here .

With the new demonstration, AMD might see this as a way to co-exist in data centers that are exclusive to Intel. When power, rack space and heating costs are going up, virtualization is efficiently consolidating hardware for a good number of today’s applications. FlexMigration allowed the customers to be able to invest in the (almost) entire Intel product line without worrying about the incompatibilities when trying to use VMWare VMotion.However, the same technology will prevent customers from investing in AMD technology becasue they will be unable to migrate workloads without downtime.

Regardless of business decisions, this capability, when it is commercialized, will put the choice back in customers hands.

Good stuff.

Installing Fedora Core 9 and Cell SDK 3.1 on Cell Blade

December 3rd, 2008

We recently had a customer requesting a Cell Blade system to be integrated in to their Infiniband cluster. Since they were looking at having only one node, we suggested using the 1U dual cell based system. I am going to explain here the process of installing Fedora Core 9 on this system. This should also apply to other RedHat based distributions.

If you are considering purchasing Mercury Systems 1U Dual Cell Based System from Mercury Systems, please note that they have humongous lead times. For the system we purchased the lead time was about 16 weeks. Another important aspect is that this system comes with just two Cell processors and memory on board. Nothing else. No hard disk, no PCI slots. On board video is available but is not supported by the system. If you are going to use any add-on cards you will have to order the PCI expansion slots along with the system. To use disk storage, you will have to order a SAS disk with the system and the PCI riser card as well. This is something we overlooked, hopefully this will help someone else when purchasing from Mercury Systems.

Turning on your system: The system cannot be accessed via the regular KVM connections. The provided serial cable has to be connected to a stand alone PC and a utility like HyperTerminal or minicom has to be used to access the system console.

  • Start HyperTerminal or minicom and open the serial port connection.
  • Switch on the system.
  • You will see lot of text go by. Press “s” a number of times to enter the firmware prompt. The system boots from network by default
  • Once the firmware prompt appears, you can choose which device to boot from
  • ex: boot net to boot from network
  • Two hotkeys F1 and F2 are available for entering the management system (BIOS)

System Installation: Cell system (Mercury Systems 1U Dual Cell Blade Based System or IBM QS22) cannot boot from a disk. The system can boot only from network. This is actually a big inconvenience because neither FC9 nor RHEL 5.2 support NFS based (nfsroot) installs. This becomes sort of a chicken & egg problem. Cell system can boot only from network but the OS does not support NFS root install. YellowDog Linux 6.1 from Terrasoft (now Fixstars) advertises fast nfs root install support. There is a nice installation tutorial available for YDL here. The guide does not mention that the NFS root install is available only for commercial version. After a good amount of wasted hours trying to do an NFS root install with YDL, I gave up on it.

IBM Support has a nice page on how to install Fedora / RedHat based distributions on QS21 / QS22 using a USB disk.
Using the IBM Support page and a USB disk, I was able to finally get the system running. Here is the procedure for Fedora Core 9 PPC:

  • You will need a TFTP / DHCP server to install or a USB DVD ROM drive. Instructions on setting up TFTP / DHCP server can be found here.
  • Copy /images/netboot/ppc64.img to the TFTP root directory. This is the kernel the system will boot when using TFTP/DHCP setup. If you are using a DVD drive, just boot from the DVD. Make sure to check the boot order. By default network is the first boot device. You can force booting from the firmware prompt (pressing “s” while system is booting) using the command “boot
  • Get a nice USB hard disk. According to the IBM Support page, only IBM 80 GB USB & Lenovo 120 GB USB are supported. I am using Western Digital 320 GB USB disk (My Book). I did face some issues with this, not serious though. More information below on the work around.
  • At the firmware prompt, use “boot net vnc” to boot the system over the network.
  • Answer the installer prompts till the GUI starts
  • Now use a VNC client to connect to the installer using the IP provided by the installer
  • When using a large USB disk (80 GB+), the installer will exit abnormally immediately after clicking “next” in the GUI welcome screen. If you do want to use a large disk, the workaround is to disconnect the USB disk before clicking “next” on the GUI installer welcome screen. As soon as the next screen shows up, reconnect the USB drive.
  • Do the install as any other RedHat/CentOS/Fedora Core install. A nice guide is available here.
  • When the installer finishes, do not click the “Reboot”.
  • Now go back to the serial console and use the following commands:
    • umount /mnt/sysimage/sys
    • umount /mnt/sysimage/proc
    • chroot /mnt/sysimage
    • source /etc/profile
    • mount /sys
    • mount /proc
    • Disable SELinux: Open /etc/selinux/config and change “SELINUX=’enforcing’” to “SELINUX=’disabled’”
    • Make sure your network card is set to use DHCP before going forward. If you have setup static IP, temporariliy change the configuration to use DHCP. This can be done by moving the configuration file: mv /etc/sysconfig/network-scripts/ifcfg-eth0 /etc/sysconfig/network-scripts/ifcfg-eth0.bak
    • Generate a new zImage to boot the kernel ramdisk from the network.
      • /sbin/mkinitrd –with=tg3 –with=nfs –net-dev=eth0 /boot/initrd-2.6.25-14.fc9-net.ppc64.img 2.6.25-14.fc9.ppc64
      • At this time, if you had static IP and moved the configuration file, move the file back: mv /etc/sysconfig/network-scripts/ifcfg-eth0.bak /etc/sysconfig/network-scripts/ifcfg-eth0
      • wrapper -i /boot/initrd-2.6.25-14.fc9-net.ppc64.img -o zImage.initrd-2.6.25-14.fc9-net.ppc64.img /boot/vmlinuz-2.6.25-14.fc9.ppc64.img
    • Now copy the generated zImage to the TFTP root directory using scp or by copying it to a USB disk.
    • Exit the choort environment
      • umount /sys
      • umount /proc
      • exit
  • Now go back to the installer GUI and click on “Reboot”

This concludes the installation. Make sure you copy the generated zImage to the TFTP root directory so this image is privoded to the  system when it boots after the installation.

Post Install Configuration:
Boot the system with the new zImage. The system will boot using the attached USB disk. You will be able to look at the boot process from the serial console. Now login as root.

  • The first  step is to install a Cell BE optimized kernel.
  • Download the kernel from BSC site: wget http://www.bsc.es/projects/deepcomputing/linuxoncell/cellsimulator/sdk3.1/kernel-2.6.25.14-108.20080910bsc.ppc64.rpm
  • Install the kernel: rpm -ivh –force kernel-2.6.25.14-108.20080910bsc.ppc64.rpm
  • Add “–nodeps” to the command above if it does not successfully install the kernel.
  • Now generate a new zImage as per the above instructions using the newly installed initrd and vmlinuz (2.6.25.14-108.20080910bsc)
  • Copy this zImage over to the TFTP root directory and over write the old zImage generated with FC 9 kernel (2.6.25-14.fc9)
  • Reboot to boot in to the new kernel.

SDK Installation & Executing Demo code:
SDK installation is pretty straight forward.

  • Download the SDK v3.1 from IBM.
  • Instructions on SDK installation are avbailable here from IBM. Only to lookout for is to install tcl before SDK installer can be installed: yum install tcl and then install SDK installer: rpm -ivh cell-install-3.1.0-0.0.noarch.rpm
  • Important Note: Follow the instructions on IBM site to add exclude directives to YUM to rpevent YUM from over writing packages optimized for Cell BE.
  • Compiling demo code also is simple. Use the provided make files.
  • Before executing any demo codes, it is advisable to configure and mount a hugeTLBFS file system.
  • To maximize the performance, large data sets should be allocated from the Huge-TLBfs. This filesystem provides a mechanism for allocating 16MB memory pages. To check the size and number of available pages, examine /proc/meminfo. If Huge-TLBFS is configured and available, /proc/meminfo will have entries as follows:
  • HugePages_Total:    24
    HugePages_Free:     24
    HugePages_Rsvd:      0
    HugePages_Surp:      0
    Hugepagesize:    16384 kB

  • If your system has not been configured with a hugetlbfs, perform the following:
    mkdir -p /huge
    mount -t hugetlbfs nodev /huge
    echo # > /proc/sys/vm/nr_hugepages
    where # is the number of huge pages you want allocated to the hugetlbfs.
  •  If you experience difficulty configuring adequate huge pages, memory may be fragmented and a reboot may be required.
  • This sequence can also be added to a startup initialize script, like /etc/rc.d/rc.sysinit, so the hugeTLB filesystem is configured during system boot.
  • A test run of Matrix Multiplication code at /opt/cell/sdk/src/demos/matrix_mul is as follows:
  • [root@cellbe matrix_mul]# ./matrix_mul -i 3 -m 192 -s 8 -v 64 -n -o 4 -p
    Initializing Arrays … done
    Running test … done
    Verifying 64 entries … PASSED
    Performance Statistics:
    number of SPEs     = 8
    execution time     = 0.00 seconds
    computation rate   = 91.66 GFlops/sec
    data transfer rate = 6.70 GBytes/sec

AMD announces 45nm Opteron (Shanghai) availability

November 13th, 2008

AMD today announced general availability of the next generation 45nm AMD Opteron Quad core processors. The official press release is available here.

Major improvements in the new Opteron architecture (code named Shanghai) in this release are as follows:

  • 45nm Manufacturing Process
  • Larger Cache
  • Support for DDR2 800 MHz
  • Upcoming enhancements for HyperTransport 3.0 (HT3)
  • Other micro-architecture enhancements like
    • AMD SmartFetch
    • AMD CoolCore
  • Maintains platform compatibility leading to better return on investment.

What benefits can you expect from the new processor?

Energy Efficiency without sacrificing Performance: The new generation of AMD Opteron processors utilize the latest AMD 45nm manufacturing process. This process allows greater clock speeds leading to higher core frequencies without dissipating too much heat. As per AMD’s announcement, new generation processors deliver 35% more performance while drawing up to 35% less power. The new manufacturing process also allows much higher clock speeds than the previous generation quad core processors. Over all, AMD Opteron processors combined with support for DDR2 memory offers platform level energy efficiency and 100% x86 compatibility.

Improved Application Performance: The latest generation processors feature two major enhancements affecting application performance: DDR2 800 MHz support and larger cache. The latest AMD Opteron (Shanghai) improves on the previous generation of AMD Opteron processors (Barcelona) with the support of 800 MHz DDR2 memory. This memory technology offers improved memory bandwidth over the previous generation of processors and offer much better energy efficiency than Fully Buffered DIMM (FB-DIMM) technology. A 200% increase in L3 cache to 6 MB benefits a number of applications across verticals, like databases, virtualiztion, JAVA applications, scientific applications, media applications and more. A faster memory bus combined with a larger cache with out complicated prefetching and snooping algorithms offers overall application efficiency.

HyperTransport 3.0 (HT3) Support: AMD Opteron processors provide unparalleled scalability and aggregate memory bandwidith by employing AMD DirectConnect architecture with HyperTransport. Previous generations offered a 1GHz HyperTransport link among the processors. Next generation enhancements planned for Q2 2009 include support for coherent HyperTransport 3.0 offering up to 17.6 GB/s of bandwidth for inter-processor communication. cHT3 will enahce platform scalability for systems featuring 4 or more AMD Opteron processors and will enahce application performance for DP platforms.

Micro-architecture Enhancements: The next generation 45nm AMD Opteron processor (Shanghai) also features enhancement to numerous other micro-architectural features. Some are listed below:

AMD SmartFetch: This technology allows cores to enter in to a “halt” state when the processor core becomes idle. In a ‘halt” state, the processor does not consume power and enhances power efficiency. This technology does not affect application performance in any way thus offering better power efficiency with no penalties on performance.

AMD CoolCore: This technology allows powering down selected sections of the processors. When a particular section is not being used, that section will be powered down in order to enhance power efficiency.This technology does not affect application performance in any way thus offering better power efficiency with no penalties on performance.

Enhanced Virtualization Performance: The next generation 45nm AMD Opteron processor (code named Shanghai) offers unsurpassed enhancements in virtualization performance. Combined with the architectural enhancements like 45nm manufacturing, larger cache, higher frequencies, higher memory bandwidth, cHT3 support, the new processor delivers faster “world switch” time enhancing virtual machine efficiency. AMD’s innovating AMD-V featuring Rapid Virtualization Indexing reduces overhead associated with software virtualization. L3 cache index disable proivdes improved data integrity as well.

With 45 nm AMD Opteron quad core processor (Shanghai), AMD continues to build on its platform strengths while addressing certain drawbacks in Barcelona processors. AMD Opteron (Shanghai) can be used on all systems, with a BIOS upgrade, supporting the Barcelona processors. Customers can avail themselves of the new processors with a simple in socket upgrade without the associated costs of a total hardware replacement. Application software will instantly experience the performance enhancements that come with 45nm AMD Opteron Shanghai processors.

HPC Systems, Inc. a platinum partner of AMD now supports 45nm AMD Opteron Shanghai processors across the product line. Systems featuring the latest generation of AMD Opteron processors are immediately available.

Read the press release here.

Formatting large volumes with ext3

November 7th, 2008

In RedHat 5.1, the maximum file system size is increased to 16 TB from 8TB. However, getting mkfs to format a volume larger than 2 TB is not straight forward.

We do  ship large volumes to customers regularly. We recommend that customers use XFS for large volumes for performance and size considerations. However, sometimes customers want only ext3 because of the familiarity with the file system.

Before being able to format a volume,  you must be able to create a volume greater than 2 TB. fdisk cannot do this.

You will need to use GNU Parted (parted) to create partitions larger than 2 TB. Details on how to use parted can be found here and here

A simple example of using parted, we assume are working on /dev/sdb of size 10 TB from a RAID controller.

$> parted /dev/sdb

GNU Parted 1.8.9
Using /dev/sdd
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted)

(parted) mkpart primary ext3 10737418240
(parted) print
(parted) quit

A straight forward mkfs command on any volume larger than 2 TB will yield the following error:

mkfs.ext3: Filesystem too large.  No more than 2**31-1 blocks
(8TB using a blocksize of 4k) are currently supported.

A simple workaround is to force mkfs to format the device in spite of the size:

mkfs.ext3 -F -b 4096 /dev/<block device>

mkfs.ext3 -F -b 4096 /dev/<path to logical volume> if you are using LVM

In order to use the above command you need to have e2fsprogs 1.39 or above. The above command also sets block size to 4kb.

You could also use -m0  to set the reserved blocks to zero.

Note that ext3 is not recommended for large volumes. XFS is better suited for that purpose.

Further reading:

RedHat Knowledgebase  Article

 Knowplace

Unixgods

AMD Opteron claims the top 3 spots in 16 core virtualization performance benchmark VMmark

August 27th, 2008

AMD (NYSE: AMD) today announced it has achieved the top spot on the VMware® VMmark virtualization benchmark for x86 servers with the Quad-Core AMD Opteron processor-based HP ProLiant DL585 G5. AMD now holds the top three spots on the 16-core VMmark benchmark. This latest result is further proof that Quad-Core AMD Opteron processors provide a high-performance virtualization solution that allows data center managers to make large-scale virtualization deployments and do so at an attractive price point.

Read more here

Compiling BLACS with OpenMPI and GCC on RHEL 5 / CentOS 5

March 12th, 2008

I had some problems compiling BLACS with OpenMPI and GCC on RHEL 5 / CentOS 5. So, here is how I got it to compile and pass the tests successfully:

OpenMPI: 1.2.5

BLACS: 1.1 with MPIBLACS Patch 03 (Feb 24, 2000)

GCC: 4.1.2

F77 = gfortran

F90 = gfortran

CC = gcc

CXX = g++

Bmake file used: BMAKES/Bmake.MPI-LINUX

Changes made to Bmake:

COMMLIB = MPI

#  ————————————-
#  Name and location of the MPI library.
#  ————————————-
MPIdir = /home/test/openmpi-install/
MPILIBdir =
MPIINCdir = $(MPIdir)/include
MPILIB =

SYSINC =

INTFACE = -DAdd_

TRANSCOMM = -DUseMpi2

WHATMPI =

SYSERRORS =

#=============================================================================
#=========================== SECTION 3: COMPILERS ============================
#=============================================================================
#  The following macros specify compilers, linker/loaders, the archiver,
#  and their options.  Some of the fortran files need to be compiled with no
#  optimization.  This is the F77NO_OPTFLAG.  The usage of the remaining
#  macros should be obvious from the names.
#=============================================================================
F77            = $(MPIdir)/bin/mpif77
F77NO_OPTFLAGS =
F77FLAGS       = $(F77NO_OPTFLAGS) -O3 -mtune=amdfam10 -march=amdfam10
F77LOADER      = $(F77)
F77LOADFLAGS   =
CC             = $(MPIdir)/bin/mpicc
CCFLAGS        = -O3 -mtune=amdfam10 -march=amdfam10
CCLOADER       = $(CC)
CCLOADFLAGS    =
Of special importance are the flags:

INTFACE = -DAdd_

TRANSCOMM = -DUseMpi2

If INTFACE is not set correctly, make tester will fail with following messages:

blacstest.o(.text+0x4c): In function `MAIN__':

: undefined reference to `blacs_pinfo_'

blacstest.o(.text+0x6e): In function `MAIN__':

: undefined reference to `blacs_get_'

blacstest.o(.text+0x8b): In function `MAIN__':

: undefined reference to `blacs_gridinit_'

blacstest.o(.text+0x94): In function `MAIN__':

More such errors follow.

If TRANSCOMM is not set correctly, make tester will complete sucecssfully and you will be able to successfully execute C interface tests also. When executing FORTRAN interface tests, the following messages are seen:

BLACS WARNING 'No need to set message ID range due to MPI communicator.
'from {-1,-1}, pnum=1, Contxt=-1, on line 18 of file 'blacs_set_.c'.
BLACS WARNING 'No need to set message ID range due to MPI communicator.'
from {-1,-1}, pnum=3, Contxt=-1, on line 18 of file 'blacs_set_.c'.
BLACS WARNING 'No need to set message ID range due to MPI communicator.'
from {-1,-1}, pnum=0, Contxt=-1, on line 18 of file 'blacs_set_.c'.
BLACS WARNING 'No need to set message ID range due to MPI communicator.'
from {-1,-1}, pnum=2, Contxt=-1, on line 18 of file 'blacs_set_.c'.
[comp-pvfs-0-7.local:30119] *** An error occurred in MPI_Comm_group
[comp-pvfs-0-7.local:30118] *** An error occurred in MPI_Comm_group
[comp-pvfs-0-7.local:30118] *** on communicator MPI_COMM_WORLD
[comp-pvfs-0-7.local:30118] *** MPI_ERR_COMM: invalid communicator
[comp-pvfs-0-7.local:30119] *** on communicator MPI_COMM_WORLD
[comp-pvfs-0-7.local:30119] *** MPI_ERR_COMM: invalid communicator
[comp-pvfs-0-7.local:30119] *** MPI_ERRORS_ARE_FATAL (goodbye) 

Hyper-V (Windows Server 2008 x64) on 32 cores

January 25th, 2008

In the previous post, we tried Hyper-V with only 16 cores as per the release notes. Now we addedd another 8 CPUS (16 cores AMD Opteron) to the same system. This was to test the x64 Windows Server 2008 on 32 cores than Hyper-V. We already did this for the x86 version here.

The system did boot up just fine. Here is a screen shot.

Windows Server 2008 x64 on 32 AMD Opteron cores

With that taken care of, we quickly browsed through the event logs to see if the Hyper-V service / hypervisor failed to start as per the release notes. There was no such message. The only way to test if the hypervisor has started or not is to fire up the Server Manager and try to boot up the virtual machines.

We were pleased to see that the hypervisor indeed started and there was no problem booting up the virtual machines. And here is a screenshot.

Hyper-V with 32 AMD Opteron cores

This opens up a wide range of usage cases. With appropriate capacity planning, the entire data center for a small company can be replaced with one 5U server and two for a highly available setup.

Cheers,

Kalyan