xCAT Windows NT/2000 HOWTO

It is rarely required to build HPC clusters or farms out of Windows-based machines.  The only reason I can think of using Windows is if the application requires it.

There are four core requirements to build a cluster:

  1. Remote hardware control.  xCAT provides this independent of the node OS.
  2. Remote BIOS/OS/Installation console.  xCAT provides BIOS and Installation console independent of the OS.  OS console depends on the node OS to support serial login or VNC for GUI logins.  Windows supports the later.
  3. Remote boot control.  xCAT provides this independent of the node OS.
  4. Remote automated unattended installation.  With imaging xCAT can provide installation independent of the node OS.

Nice to have:

  1. Parallel shell/command prompt.
  2. Parallel file copy.
  3. Parallel file sync.

This document provides the necessary information on how to set Windows NT/2000 as a compute node OS to support xCAT as a cluster management tool.  xCAT will have to run on Linux.

If you do not mind building and managing your Windows-based cluster with a Linux-based management node, then read on.

This document assumes familiarity with using xCAT.  If not read the Redbook and man pages.  The term management node refers to xCAT running on Linux installed on a server.  NOTE: If you must use Windows to manage your cluster, then install xCAT in a VMware session.

Building and testing your image

Install Windows 2000 or NT 4.0 .  Windows XP has been untested, but should work.  I do not recommend installing Windows 2000/NT Server because the licenses are more expensive and the added functionality is usually not required.  But its your choice.  You must have legally obtained licenses for all the commercial software per node.

  1. Install Windows 2000 or NT 4.0.  Install it any way you like.  You must use DHCP.

  2. After installation, setup your environment and install your applications.

  3. (Optional) Install Windows UNIX Services (http://www.microsoft.com/windows/sfu).  If you require NFS/NIS integration.

  4. (Optional) Install Windows Resource Kit for the shutdown.exe utility.  This is helpful to reboot the entire cluster using psh, however the Resource Kit is a licensed product.  If you can find an alternative CLI shutdown or reboot command, let me know, I could use it too.

  5. Install the latest Service Packs and security updates.

  6. Setup a "root" user and place in the Administrators group.  From root's properties check User cannot change password and Password never expires.

  7. Install VNC service for Windows.
     
  8. Install VNC client for Linux.  On your management node:
     
  9. Install Cygwin on your Windows node.
     
  10. Setup sshd.
  11. (Optional, but recommended) Setup D4 for time sync.
  12. Reboot and test ssh, gcons and your applications.

  13. Windows must have unique SIDs and compute names, unique hostnames are good too.  Login to Windows with gcons.
  14. Defrag Windows node disk.  Important.

  15. Now you are ready to rollout.  Test one node first.  Read the image-HOWTO.html for notes on how to rollout and backup your images.


Egan Ford
egan@us.ibm.com
August 2004