[torqueusers] problem getting files copied

Andreas Davour davour at pdc.kth.se
Thu Jul 8 07:05:54 MDT 2010


I have been trying to get our kerberized torque to accept jobs. So far with 
mixed success.

When submitting a job it ends up in a queue, but even though maui schedules 
it, it never starts.

Looking in /var/spool/torque I find nothing looking like ER or OU files or any 
uncopied files in the undelivered directory.

On the only node inline the mom log say:
07/08/2010 14:52:09;0001;   pbs_mom;Job;TMomFinalizeJob3;start failed, 
improper sid
07/08/2010 14:52:09;0008;   pbs_mom;Req;send_sisters;sending ABORT to sisters 
for job 15.scheduler-torque-l.pdc.kth.se
07/08/2010 14:52:09;0080;   pbs_mom;Svr;preobit_reply;top of preobit_reply
07/08/2010 14:52:09;0080;   
pbs_mom;Svr;preobit_reply;DIS_reply_read/decode_DIS_replySvr worked, top of 
while loop
07/08/2010 14:52:09;0080;   pbs_mom;Svr;preobit_reply;in while loop, no error 
from job stat
07/08/2010 14:52:09;0080;   pbs_mom;Job;15.scheduler-torque-l.pdc.kth.se;obit 
sent to server

I have not set up any ssh keys, since I figured that using kerberos to login 
and submit a job, log in access from the scheduler node to the work node 
should have been taken care of. I tried to rcp a file and it worked ok.

Any hints on where to look?

-- 
Systems Engineer
PDC Center for High Performance Computing
CSC School of Computer Science and Communication
KTH Royal Institute of Technology
SE-100 44 Stockholm, Sweden
Phone: 087906658
"A satellite, an earring, and a dust bunny are what made America great!"


More information about the torqueusers mailing list