Archive for December, 2008

Integrating Cell based system in to ROCKS

Thursday, December 11th, 2008

After successfully installing Fedora 9 on Cell based system (Mercury 1U dual cell blade based system), now we had to integrate it in to a ROCKS cluster.

ROCKS sends the appropriate kernel image by looking at the vendor-class-identifier information. Current DHCP configuration file supports only IA64 (EFI), x86_64, x86 and of course, network switches. Although, ROCKS no longer supports IA64 (Itanium), the code is still there.

The first task is to add the Cell system in to the ROCKS database. We decided to add the node as a “Remote Management” appliance than as a compute node. Adding as compute node would modify the configuration files for SGE or PBS and will always show up as “down” status. To do this, execute the following command:

insert-ethers --mac <give your mac id here>

When the insert-ethers UI shows up, select “Remote Management” and hit ok. You may also choose to provide your own hostname using the option “–hostname”

The next task to identify the vendor class identification for the Cell system. After a quick test, it was determined that the system had no vendor class identifier. Since we were dealing with only one system, the best option was to match the MAC ID of the sytem with the following elsif block:

        } elsif ((binary-to-ascii(16,8,":",substring(hardware,1,6))="0:1a:64:e:2a:94")) {
                # Cell blade System
                filename "cellbe.img";
                next-server 10.1.1.1;
        }

“cellbe.img” is the kernel image for the Cell system. This has to be copied to “/tftpboot/pxelinux/”.

These changes will be lost if dhcpd.conf is overwritten, which happens every time you execute insert-ethers or use

dbreport dhcpd

to overwrite the file.

You could generate a patch file and patch the dhcpd.conf every as needed or you could edit

//opt/rocks/lib/python2.4/site-packages/rocks/reports/dhcpd.py

to include the new elsif block everytime the file is generated.

If you see your cell system is trying to load

/install/sbin/kickstart.cgi

means your dhcpd.conf file is overwritten.

References:

http://archives.devshed.com/forums/networking-100/cannot-see-the-offer-and-ack-packet-with-ethereal-2063723.html

http://forums.opensuse.org/network-internet/399162-dhcp-client-identifier-matching.html

http://osdir.com/ml/network.dhcp.isc.dhcp-server/2004-05/msg00037.html

Code block to identify the vendor class identifier and other useful information:

               log(info, concat("Debug Information:\t",
               binary-to-ascii(16,8, ":", substring(hardware,1,6)),
               "\t",
               binary-to-ascii(10,8, "-", option dhcp-parameter-request-list),
               "\t",
               pick-first-value(option vendor-class-identifier,"no-identifier")
               )
               );

Python 3.0 Released

Thursday, December 4th, 2008

Read more here.

Python is one of those things that I am always excited about. It may have been the underdog compared to Perl at some point in the past, but Python now is the ubiquitous  language that is working behind the scenes for almost any application.

Python can be seen working from web servers to simple desktop applications and everything else you can think of.

MPI+Python holds a bit of an interest for me.

Some active projects:

MPI for Python

 pyMPI

pyPar

HPC Systems at SuperComputing 08 (SC08)

Thursday, December 4th, 2008

We were at booth number: 1726

On display was:

HiPerStation 8000 with 2X NVIDIA Tesla C1060

Here is a brief video of our exhibit at SC08. The demo shows couple of codes from the NVIDIA CUDA SDK and an instance of NAMD ported to CUDA .

Unable to update SUSE Linux 10

Wednesday, December 3rd, 2008

If you get this error message when trying to update a SUSE 10 based system using the Novell Customer Center Configuration menu

Execute curl command failed with '60':
curl: (60) SSL certificate problem, verify that the CA cert is OK. Details:
error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify
failed

the easiest & fastest way to fix it is to

Check the system time!

Yes, simple as that. Make sure your system time is correct and your update should proceed smoothly.

There goes almost a day wasted trying to figure this out.

As you can tell, I am not that good with SUSE, or am I? :)

RedHat/AMD trumps VMWare/Intel on live VM migration technology

Wednesday, December 3rd, 2008

In this YouTube video, RedHat & AMD demonstrate live migration of a virtual machine from Intel based server to an AMD based server.

Ever since Intel bought a stake in VMWare, VMWare products have noticeably and obviously featured enhancements for Intel processors. One of the coolest things to come out after the VMWare/Intel alliance was the capability to move virtual machines from one generation of Intel processors to another generation of Intel processors. This capability is markted as “VMWare Enhanced VMotion” and “Intel VT FlexMigration”. FlexMigration was a much needed feature given how incompatible one generation of Intel processors are from the other. Intel being one of the biggest investors, VMWare may be reluctant to provide enhancements to VMWare that work better with AMD’s products. A related post is here .

With the new demonstration, AMD might see this as a way to co-exist in data centers that are exclusive to Intel. When power, rack space and heating costs are going up, virtualization is efficiently consolidating hardware for a good number of today’s applications. FlexMigration allowed the customers to be able to invest in the (almost) entire Intel product line without worrying about the incompatibilities when trying to use VMWare VMotion.However, the same technology will prevent customers from investing in AMD technology becasue they will be unable to migrate workloads without downtime.

Regardless of business decisions, this capability, when it is commercialized, will put the choice back in customers hands.

Good stuff.

Installing Fedora Core 9 and Cell SDK 3.1 on Cell Blade

Wednesday, December 3rd, 2008

We recently had a customer requesting a Cell Blade system to be integrated in to their Infiniband cluster. Since they were looking at having only one node, we suggested using the 1U dual cell based system. I am going to explain here the process of installing Fedora Core 9 on this system. This should also apply to other RedHat based distributions.

If you are considering purchasing Mercury Systems 1U Dual Cell Based System from Mercury Systems, please note that they have humongous lead times. For the system we purchased the lead time was about 16 weeks. Another important aspect is that this system comes with just two Cell processors and memory on board. Nothing else. No hard disk, no PCI slots. On board video is available but is not supported by the system. If you are going to use any add-on cards you will have to order the PCI expansion slots along with the system. To use disk storage, you will have to order a SAS disk with the system and the PCI riser card as well. This is something we overlooked, hopefully this will help someone else when purchasing from Mercury Systems.

Turning on your system: The system cannot be accessed via the regular KVM connections. The provided serial cable has to be connected to a stand alone PC and a utility like HyperTerminal or minicom has to be used to access the system console.

  • Start HyperTerminal or minicom and open the serial port connection.
  • Switch on the system.
  • You will see lot of text go by. Press “s” a number of times to enter the firmware prompt. The system boots from network by default
  • Once the firmware prompt appears, you can choose which device to boot from
  • ex: boot net to boot from network
  • Two hotkeys F1 and F2 are available for entering the management system (BIOS)

System Installation: Cell system (Mercury Systems 1U Dual Cell Blade Based System or IBM QS22) cannot boot from a disk. The system can boot only from network. This is actually a big inconvenience because neither FC9 nor RHEL 5.2 support NFS based (nfsroot) installs. This becomes sort of a chicken & egg problem. Cell system can boot only from network but the OS does not support NFS root install. YellowDog Linux 6.1 from Terrasoft (now Fixstars) advertises fast nfs root install support. There is a nice installation tutorial available for YDL here. The guide does not mention that the NFS root install is available only for commercial version. After a good amount of wasted hours trying to do an NFS root install with YDL, I gave up on it.

IBM Support has a nice page on how to install Fedora / RedHat based distributions on QS21 / QS22 using a USB disk.
Using the IBM Support page and a USB disk, I was able to finally get the system running. Here is the procedure for Fedora Core 9 PPC:

  • You will need a TFTP / DHCP server to install or a USB DVD ROM drive. Instructions on setting up TFTP / DHCP server can be found here.
  • Copy /images/netboot/ppc64.img to the TFTP root directory. This is the kernel the system will boot when using TFTP/DHCP setup. If you are using a DVD drive, just boot from the DVD. Make sure to check the boot order. By default network is the first boot device. You can force booting from the firmware prompt (pressing “s” while system is booting) using the command “boot
  • Get a nice USB hard disk. According to the IBM Support page, only IBM 80 GB USB & Lenovo 120 GB USB are supported. I am using Western Digital 320 GB USB disk (My Book). I did face some issues with this, not serious though. More information below on the work around.
  • At the firmware prompt, use “boot net vnc” to boot the system over the network.
  • Answer the installer prompts till the GUI starts
  • Now use a VNC client to connect to the installer using the IP provided by the installer
  • When using a large USB disk (80 GB+), the installer will exit abnormally immediately after clicking “next” in the GUI welcome screen. If you do want to use a large disk, the workaround is to disconnect the USB disk before clicking “next” on the GUI installer welcome screen. As soon as the next screen shows up, reconnect the USB drive.
  • Do the install as any other RedHat/CentOS/Fedora Core install. A nice guide is available here.
  • When the installer finishes, do not click the “Reboot”.
  • Now go back to the serial console and use the following commands:
    • umount /mnt/sysimage/sys
    • umount /mnt/sysimage/proc
    • chroot /mnt/sysimage
    • source /etc/profile
    • mount /sys
    • mount /proc
    • Disable SELinux: Open /etc/selinux/config and change “SELINUX=’enforcing’” to “SELINUX=’disabled’”
    • Make sure your network card is set to use DHCP before going forward. If you have setup static IP, temporariliy change the configuration to use DHCP. This can be done by moving the configuration file: mv /etc/sysconfig/network-scripts/ifcfg-eth0 /etc/sysconfig/network-scripts/ifcfg-eth0.bak
    • Generate a new zImage to boot the kernel ramdisk from the network.
      • /sbin/mkinitrd –with=tg3 –with=nfs –net-dev=eth0 /boot/initrd-2.6.25-14.fc9-net.ppc64.img 2.6.25-14.fc9.ppc64
      • At this time, if you had static IP and moved the configuration file, move the file back: mv /etc/sysconfig/network-scripts/ifcfg-eth0.bak /etc/sysconfig/network-scripts/ifcfg-eth0
      • wrapper -i /boot/initrd-2.6.25-14.fc9-net.ppc64.img -o zImage.initrd-2.6.25-14.fc9-net.ppc64.img /boot/vmlinuz-2.6.25-14.fc9.ppc64.img
    • Now copy the generated zImage to the TFTP root directory using scp or by copying it to a USB disk.
    • Exit the choort environment
      • umount /sys
      • umount /proc
      • exit
  • Now go back to the installer GUI and click on “Reboot”

This concludes the installation. Make sure you copy the generated zImage to the TFTP root directory so this image is privoded to the  system when it boots after the installation.

Post Install Configuration:
Boot the system with the new zImage. The system will boot using the attached USB disk. You will be able to look at the boot process from the serial console. Now login as root.

  • The first  step is to install a Cell BE optimized kernel.
  • Download the kernel from BSC site: wget http://www.bsc.es/projects/deepcomputing/linuxoncell/cellsimulator/sdk3.1/kernel-2.6.25.14-108.20080910bsc.ppc64.rpm
  • Install the kernel: rpm -ivh –force kernel-2.6.25.14-108.20080910bsc.ppc64.rpm
  • Add “–nodeps” to the command above if it does not successfully install the kernel.
  • Now generate a new zImage as per the above instructions using the newly installed initrd and vmlinuz (2.6.25.14-108.20080910bsc)
  • Copy this zImage over to the TFTP root directory and over write the old zImage generated with FC 9 kernel (2.6.25-14.fc9)
  • Reboot to boot in to the new kernel.

SDK Installation & Executing Demo code:
SDK installation is pretty straight forward.

  • Download the SDK v3.1 from IBM.
  • Instructions on SDK installation are avbailable here from IBM. Only to lookout for is to install tcl before SDK installer can be installed: yum install tcl and then install SDK installer: rpm -ivh cell-install-3.1.0-0.0.noarch.rpm
  • Important Note: Follow the instructions on IBM site to add exclude directives to YUM to rpevent YUM from over writing packages optimized for Cell BE.
  • Compiling demo code also is simple. Use the provided make files.
  • Before executing any demo codes, it is advisable to configure and mount a hugeTLBFS file system.
  • To maximize the performance, large data sets should be allocated from the Huge-TLBfs. This filesystem provides a mechanism for allocating 16MB memory pages. To check the size and number of available pages, examine /proc/meminfo. If Huge-TLBFS is configured and available, /proc/meminfo will have entries as follows:
  • HugePages_Total:    24
    HugePages_Free:     24
    HugePages_Rsvd:      0
    HugePages_Surp:      0
    Hugepagesize:    16384 kB

  • If your system has not been configured with a hugetlbfs, perform the following:
    mkdir -p /huge
    mount -t hugetlbfs nodev /huge
    echo # > /proc/sys/vm/nr_hugepages
    where # is the number of huge pages you want allocated to the hugetlbfs.
  •  If you experience difficulty configuring adequate huge pages, memory may be fragmented and a reboot may be required.
  • This sequence can also be added to a startup initialize script, like /etc/rc.d/rc.sysinit, so the hugeTLB filesystem is configured during system boot.
  • A test run of Matrix Multiplication code at /opt/cell/sdk/src/demos/matrix_mul is as follows:
  • [root@cellbe matrix_mul]# ./matrix_mul -i 3 -m 192 -s 8 -v 64 -n -o 4 -p
    Initializing Arrays … done
    Running test … done
    Verifying 64 entries … PASSED
    Performance Statistics:
    number of SPEs     = 8
    execution time     = 0.00 seconds
    computation rate   = 91.66 GFlops/sec
    data transfer rate = 6.70 GBytes/sec