[torqueusers] node bad state

Ghislain ESCORNE ghislain.escorne at obs.ujf-grenoble.fr
Wed Nov 30 03:18:46 MST 2005


Hello,
I have a problem when I try to submit many jobs which need to run on 
more than one node.

---------------------------script--------------------------------------
#PBS -l nodes=2:ppn=2,walltime=00:05:00
###PBS -m abe
# Ca c'est bon
echo `cat $PBS_NODEFILE`
# Ca c'est pas bon
echo test : $PBS_NODEFILE
echo $PBS_O_WORKDIR
#echo $PBS_WORKDIR
echo $PBS_JOBID

----------------------------------------------------------------
[root at rock-lgit server_logs]# pbsnodes -a
compute-0-0.local
     state = free
     np = 2
     ntype = cluster
     status = opsys=linux,uname=Linux compute-0-0.local 
2.6.9-5.0.5.ELsmp #1 SMP Wed Apr 20 00:16:40 BST 2005 i686,sessions=? 
0,nsessions=? 
0,nusers=0,idletime=498839,totmem=8250696kb,availmem=8090004kb,physmem=4154132kb,ncpus=4,loadave=0.00,netload=4095556878,state=free,jobs=? 
0,rectime=1133345212

compute-0-1.local
     state = free
     np = 2
     ntype = cluster
     status = opsys=linux,uname=Linux compute-0-1.local 
2.6.9-5.0.5.ELsmp #1 SMP Wed Apr 20 00:16:40 BST 2005 i686,sessions=? 
0,nsessions=? 
0,nusers=0,idletime=61132,totmem=8250696kb,availmem=8136580kb,physmem=4154132kb,ncpus=4,loadave=0.04,netload=4168023640,state=free,jobs=? 
0,rectime=1133345189

------------------------------------------ log 
pbs_server-------------------------------------
11/30/2005 
11:06:46;0009;PBS_Server;Job;7.rock-lgit.obs.ujf-grenoble.fr;obit 
received for job 7.rock-lgit.obs.ujf-grenoble.fr from host 
compute-0-1.local with bad state (state: QUEUED)
11/30/2005 11:06:46;0080;PBS_Server;Req;req_reject;Reject reply 
code=15016(Request invalid for state of job), aux=0, type=JobObituary, 
from pbs_mom at compute-0-1.local
11/30/2005 
11:06:46;0008;PBS_Server;Job;7.rock-lgit.obs.ujf-grenoble.fr;MOM 
rejected modify request, error: 15001
11/30/2005 11:06:46;0080;PBS_Server;Req;req_reject;Reject reply 
code=15001(Unknown Job Id), aux=0, type=ModifyJob, from 
root at rock-lgit.obs.ujf-grenoble.fr
------------------------------------------------------------

root at rock-lgit server_logs]# checkjob 7


checking job 7

State: Running
Creds:  user:gescorne  group:1110  class:short_mpi  qos:DEFAULT
WallTime: 00:00:00 of 00:05:00
SubmitTime: Wed Nov 30 11:05:27
  (Time Queued  Total: 00:05:51  Eligible: 00:05:51)

StartTime: Wed Nov 30 11:11:18
Total Tasks: 4

Req[0]  TaskCount: 4  Partition: DEFAULT
Network: [NONE]  Memory >= 0  Disk >= 0  Swap >= 0
Opsys: [NONE]  Arch: [NONE]  Features: [NONE]
Allocated Nodes:
[compute-0-1.local:2][compute-0-0.local:2]


IWD: [NONE]  Executable:  [NONE]
Bypass: 0  StartCount: 351
PartitionMask: [ALL]
Flags:       RESTARTABLE

Reservation '7' (00:00:00 -> 00:05:00  Duration: 00:05:00)
PE:  4.00  StartPriority:  5







More information about the torqueusers mailing list