[torqueusers] problem getting files copied
Andreas Davour
davour at pdc.kth.se
Thu Jul 8 07:05:54 MDT 2010
I have been trying to get our kerberized torque to accept jobs. So far with
mixed success.
When submitting a job it ends up in a queue, but even though maui schedules
it, it never starts.
Looking in /var/spool/torque I find nothing looking like ER or OU files or any
uncopied files in the undelivered directory.
On the only node inline the mom log say:
07/08/2010 14:52:09;0001; pbs_mom;Job;TMomFinalizeJob3;start failed,
improper sid
07/08/2010 14:52:09;0008; pbs_mom;Req;send_sisters;sending ABORT to sisters
for job 15.scheduler-torque-l.pdc.kth.se
07/08/2010 14:52:09;0080; pbs_mom;Svr;preobit_reply;top of preobit_reply
07/08/2010 14:52:09;0080;
pbs_mom;Svr;preobit_reply;DIS_reply_read/decode_DIS_replySvr worked, top of
while loop
07/08/2010 14:52:09;0080; pbs_mom;Svr;preobit_reply;in while loop, no error
from job stat
07/08/2010 14:52:09;0080; pbs_mom;Job;15.scheduler-torque-l.pdc.kth.se;obit
sent to server
I have not set up any ssh keys, since I figured that using kerberos to login
and submit a job, log in access from the scheduler node to the work node
should have been taken care of. I tried to rcp a file and it worked ok.
Any hints on where to look?
--
Systems Engineer
PDC Center for High Performance Computing
CSC School of Computer Science and Communication
KTH Royal Institute of Technology
SE-100 44 Stockholm, Sweden
Phone: 087906658
"A satellite, an earring, and a dust bunny are what made America great!"
More information about the torqueusers
mailing list