[torqueusers] Configuring torque/Maui on an SGI Altix

Michael Seymour seymour at atmosp.physics.utoronto.ca
Mon Aug 8 12:32:44 MDT 2005


Hello,

I am new to Torque and job schedulers in general. Does anyone have 
experience with configuring Torque and possibly Maui on an SGI Altix 
machine? The pbs_server is running on an external linux box, and is in a 
semi-functional state.

I am currently stuck at getting a submitted job to run.

pbsnodes -a returns:

altix01.atmosp.physics.utoronto.ca:
      state = free
      np = 1
      ntype = time-shared
      status = arch=linux,uname=Linux altix01 2.4.21-sgi302r24 #1 SMP Fri Oct 
22 22:43:12 PDT 2004 ia64,sessions=4263 4577 4703 4784
5583,nsessions=5,nusers=3,idletime=6324,totmem=44998320kb,availmem=40222096kb,physmem=35782352kb,ncpus=16,loadave=8.50,netload=18446744073701380569,state=free,rectime=1123520233

Submitting a job returns this email:

seymour at boreas$ echo 'sleep 10' | /usr/local/torque/bin/qsub

PBS Job Id: 21.boreas
Job Name:   STDIN
File stage in failed, see below.
Job will be retried later, please investigate and correct problem.
Job deleted at request of Scheduler at boreas
Job could never run

And tracejob reeturns this:

seymour at boreas$ tracejob -n 10 21

Job: 21.boreas

08/08/2005 12:57:13  L    Job Deleted because it would never run
08/08/2005 12:57:13  S    Job Queued at request of seymour at boreas, owner =
                           seymour at boreas, job name = STDIN, queue = batch
08/08/2005 12:57:13  S    Job deleted at request of Scheduler at boreas
08/08/2005 12:57:13  L    Not enough of the right type of nodes available
08/08/2005 12:57:13  S    enqueuing into batch, state 1 hop 1
08/08/2005 12:57:13  S    dequeuing from batch, state EXITING

There are no jobs currently running on the node, as it is listed as free. 
Any suggestions?

In general, what should I know for defining multiple queues for the Altix 
machine, with respect to server, scheduler and cpuset setup? We would like 
one queue for large jobs and possibly two queues for smaller jobs. Does 
anyone have a configuration that can be used for an example or point me to 
a web site that contains relevant information?

Thanks,
Mike S.

--
Michael D. Seymour, Computer Support
Atmospheric Physics Group, Department of Physics, University of Toronto
60 St. George Street, Toronto, ON, Canada, M5S 1A7 
Tel: 416-946-3019   Fax: 416-978-8905
seymour at atmosp.physics.utoronto.ca


More information about the torqueusers mailing list