[torqueusers] Parallel Job with more than one compute node doesn't start!
ramon.bastiaans at sara.nl
Fri Dec 11 08:59:10 MST 2009
On 12/11/2009 12:50 PM, Dr. Stephan Raub wrote:
> 12/11/2009 14:03:53;0001; pbs_mom;Svr;pbs_mom;LOG_ERROR::Bad UID for job
> execution (15023) in 78.xxx, job_start_error from node 192.168.1.63:15003 in
That's the problem right there. The execution node where the job is
started, has a different /etc/passwd file than the machine where the job
was submitted from. Either the execution user does not exist on the
node, or has a different UID, etc.
Maybe the node lost it's LDAP connectivity or it's userbase is not in
sync through some other means.
R. Bastiaans, B.ICT :: Systems Programmer, HPC&V
SARA - Computing& Networking Services
Science Park 121 PO Box 94613
1098 XG Amsterdam NL 1090 GP Amsterdam NL
P.+31 (0)20 592 3000 F.+31 (0)20 668 3167
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 5148 bytes
Desc: S/MIME Cryptographic Signature
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20091211/33430b26/attachment.bin
More information about the torqueusers