[torqueusers] Parallel Job with more than one compute node doesn't start!

Ramon Bastiaans ramon.bastiaans at sara.nl
Fri Dec 11 08:59:10 MST 2009


On 12/11/2009 12:50 PM, Dr. Stephan Raub wrote:
> 12/11/2009 14:03:53;0001;   pbs_mom;Svr;pbs_mom;LOG_ERROR::Bad UID for job
> execution (15023) in 78.xxx, job_start_error from node 192.168.1.63:15003 in
> job_start_error
>    
That's the problem right there. The execution node where the job is 
started, has a different /etc/passwd file than the machine where the job 
was submitted from. Either the execution user does not exist on the 
node, or has a different UID, etc.

Maybe the node lost it's LDAP connectivity or it's userbase is not in 
sync through some other means.


Kind regards,
- Ramon.

-- 
R. Bastiaans, B.ICT :: Systems Programmer, HPC&V

SARA - Computing&  Networking Services
Science Park 121     PO Box 94613
1098 XG Amsterdam NL 1090 GP Amsterdam NL
P.+31 (0)20 592 3000 F.+31 (0)20 668 3167


-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5148 bytes
Desc: S/MIME Cryptographic Signature
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20091211/33430b26/attachment.bin 


More information about the torqueusers mailing list