Running Batch Jobs
The queueing system used on the E10k's is the Load Sharing Facility
(LSF)
version 3.2.3, written by Platform Computing Corporation.
The queues structure currently in place on the E10k cluster:
|
|
Purpose |
| hpcm | Parallel batch jobs on magenta (64 processors) |
| hpct | Parallel batch jobs on teal (32 processors) |
| short | Quick, serial batch jobs that need few resources |
| normal | Moderately long serial batch jobs |
Batch Job submission
To submit a batch job, use the "bsub" command. "bsub" has many options (use "man bsub" to find out more), but all you need to begin with is '-q', to specify a the queue name, '-I' if you want an interactive parallel batch job, and '-n' to specify the minimum and maximim number of processors necessary to run the job.
Here is an example using the hpct queue to run an MPI program called "monte":
% bsub -o m3.o -e me.e -n
1,32 -q hpct ./monte
job <425 is submitted
to queue <hpc.
The files m3.o and m3.e will contain stdout and stderr respectively.
Documentation and the LSF Users manual can be found at the following
webpage URL:
http://www.rahul.net/chord/pg30/users-title.html.
Again, the queue design may change, and users are encouraged to frequently
check the E10k web page for latest topology and queue structures.