[torqueusers] node bad state
Ghislain ESCORNE
ghislain.escorne at obs.ujf-grenoble.fr
Wed Nov 30 03:18:46 MST 2005
Hello,
I have a problem when I try to submit many jobs which need to run on
more than one node.
---------------------------script--------------------------------------
#PBS -l nodes=2:ppn=2,walltime=00:05:00
###PBS -m abe
# Ca c'est bon
echo `cat $PBS_NODEFILE`
# Ca c'est pas bon
echo test : $PBS_NODEFILE
echo $PBS_O_WORKDIR
#echo $PBS_WORKDIR
echo $PBS_JOBID
----------------------------------------------------------------
[root at rock-lgit server_logs]# pbsnodes -a
compute-0-0.local
state = free
np = 2
ntype = cluster
status = opsys=linux,uname=Linux compute-0-0.local
2.6.9-5.0.5.ELsmp #1 SMP Wed Apr 20 00:16:40 BST 2005 i686,sessions=?
0,nsessions=?
0,nusers=0,idletime=498839,totmem=8250696kb,availmem=8090004kb,physmem=4154132kb,ncpus=4,loadave=0.00,netload=4095556878,state=free,jobs=?
0,rectime=1133345212
compute-0-1.local
state = free
np = 2
ntype = cluster
status = opsys=linux,uname=Linux compute-0-1.local
2.6.9-5.0.5.ELsmp #1 SMP Wed Apr 20 00:16:40 BST 2005 i686,sessions=?
0,nsessions=?
0,nusers=0,idletime=61132,totmem=8250696kb,availmem=8136580kb,physmem=4154132kb,ncpus=4,loadave=0.04,netload=4168023640,state=free,jobs=?
0,rectime=1133345189
------------------------------------------ log
pbs_server-------------------------------------
11/30/2005
11:06:46;0009;PBS_Server;Job;7.rock-lgit.obs.ujf-grenoble.fr;obit
received for job 7.rock-lgit.obs.ujf-grenoble.fr from host
compute-0-1.local with bad state (state: QUEUED)
11/30/2005 11:06:46;0080;PBS_Server;Req;req_reject;Reject reply
code=15016(Request invalid for state of job), aux=0, type=JobObituary,
from pbs_mom at compute-0-1.local
11/30/2005
11:06:46;0008;PBS_Server;Job;7.rock-lgit.obs.ujf-grenoble.fr;MOM
rejected modify request, error: 15001
11/30/2005 11:06:46;0080;PBS_Server;Req;req_reject;Reject reply
code=15001(Unknown Job Id), aux=0, type=ModifyJob, from
root at rock-lgit.obs.ujf-grenoble.fr
------------------------------------------------------------
root at rock-lgit server_logs]# checkjob 7
checking job 7
State: Running
Creds: user:gescorne group:1110 class:short_mpi qos:DEFAULT
WallTime: 00:00:00 of 00:05:00
SubmitTime: Wed Nov 30 11:05:27
(Time Queued Total: 00:05:51 Eligible: 00:05:51)
StartTime: Wed Nov 30 11:11:18
Total Tasks: 4
Req[0] TaskCount: 4 Partition: DEFAULT
Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0
Opsys: [NONE] Arch: [NONE] Features: [NONE]
Allocated Nodes:
[compute-0-1.local:2][compute-0-0.local:2]
IWD: [NONE] Executable: [NONE]
Bypass: 0 StartCount: 351
PartitionMask: [ALL]
Flags: RESTARTABLE
Reservation '7' (00:00:00 -> 00:05:00 Duration: 00:05:00)
PE: 4.00 StartPriority: 5
More information about the torqueusers
mailing list