[torqueusers] Job execution problem

Vahe nr vner75 at gmail.com
Wed Feb 15 07:50:01 MST 2012


Dear all
I have a strange problem when submitting a job on a Linux Cluster, so I
will detail some information to help to identify the problem:
The Torque version on my controller node
is: torque-server-2.5.7-7.el5.x86_64
The Torque version on my compute nodes is: torque-client-2.5.7-7.el5.x86_64
There is an ssh access without password, no iptables on both sides.
pbnodes -a shows the all available nodes on free state
This is the part of error in server_log:
02/15/2012 11:34:19;0008;PBS_Server;Job;220.ce.seua-cluster.grid.am;send of
job to wn1.seua-cluster.grid.am failed error = 15002
02/15/2012 11:34:19;0001;PBS_Server;Svr;PBS_Server;LOG_ERROR::Undefined
attribute  (15002) in send_job, child failed in previous commit request for
job 220.ce.seua-cluster.grid.am
02/15/2012 11:34:19;0008;PBS_Server;Job;220.ce.seua-cluster.grid.am;unable
to run job, MOM rejected/rc=1
02/15/2012 11:34:19;0080;PBS_Server;Req;req_reject;Reject reply
code=15043(Execution server rejected request MSG=cannot send job to mom,
state=PRERUN), aux=0, type=RunJob, from root at ce.seua-cluster.grid.am
02/15/2012 11:34:19;0040;PBS_Server;Svr;ce.seua-cluster.grid.am;Scheduler
was sent the command new
This is the message output from checkjob command:
checkjob 220


checking job 220

State: Idle
WallTime: 00:00:00 of 00:01:00
SubmitTime: Wed Feb 15 09:58:51
  (Time Queued  Total: 4:49:09  Eligible: 00:00:00)

Total Tasks: 1

Req[0]  TaskCount: 1  Partition: ALL
Network: [NONE]  Memory >= 0  Disk >= 0  Swap >= 0
Opsys: [NONE]  Arch: [NONE]  Features: [NONE]


IWD: [NONE]  Executable:  [NONE]
Bypass: 0  StartCount: 25
PartitionMask: [ALL]
Flags:       RESTARTABLE

Holds:    Batch
Messages:  cannot start job - RM failure, rc: 15043, msg: 'Execution server
rejected request MSG=cannot send job to mom, state=PRERUN'
PE:  1.00  StartPriority:  194
cannot select job 220 for partition DEFAULT (job hold active)

I would appreciate any help. Thanks in advance.

Regards
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20120215/9f5a5c1b/attachment-0001.html 


More information about the torqueusers mailing list