[torqueusers] problem getting files copied

Andreas Davour davour at pdc.kth.se
Thu Jul 8 07:05:54 MDT 2010

I have been trying to get our kerberized torque to accept jobs. So far with 
mixed success.

When submitting a job it ends up in a queue, but even though maui schedules 
it, it never starts.

Looking in /var/spool/torque I find nothing looking like ER or OU files or any 
uncopied files in the undelivered directory.

On the only node inline the mom log say:
07/08/2010 14:52:09;0001;   pbs_mom;Job;TMomFinalizeJob3;start failed, 
improper sid
07/08/2010 14:52:09;0008;   pbs_mom;Req;send_sisters;sending ABORT to sisters 
for job 15.scheduler-torque-l.pdc.kth.se
07/08/2010 14:52:09;0080;   pbs_mom;Svr;preobit_reply;top of preobit_reply
07/08/2010 14:52:09;0080;   
pbs_mom;Svr;preobit_reply;DIS_reply_read/decode_DIS_replySvr worked, top of 
while loop
07/08/2010 14:52:09;0080;   pbs_mom;Svr;preobit_reply;in while loop, no error 
from job stat
07/08/2010 14:52:09;0080;   pbs_mom;Job;15.scheduler-torque-l.pdc.kth.se;obit 
sent to server

I have not set up any ssh keys, since I figured that using kerberos to login 
and submit a job, log in access from the scheduler node to the work node 
should have been taken care of. I tried to rcp a file and it worked ok.

Any hints on where to look?

Systems Engineer
PDC Center for High Performance Computing
CSC School of Computer Science and Communication
KTH Royal Institute of Technology
SE-100 44 Stockholm, Sweden
Phone: 087906658
"A satellite, an earring, and a dust bunny are what made America great!"

More information about the torqueusers mailing list