[torqueusers] Problem with Torque with AMD Opteron and RHEL 3

Garrick Staples garrick at usc.edu
Wed Dec 8 13:20:36 MST 2004


On Tue, Dec 07, 2004 at 09:18:30AM -0200, Leandro Tavares Carneiro alleged:
> Hi everyone,
> 
> 	I'm having a very strange behave of Torque whith a little cluster of 
> 136 nodes of Dual Opterons 244, running RHEL WS 3 Update 3.
> 
> 	I have started with version 1.1.0p3 and, when i tried to create a 
> 	job of any kind it dosent run or start and i got the 15023 error (Bad user 
> - no password entry).
> 
> 	I have checked everything in my nodes and server and is everything 
> 	OK. All the nodes can recognize the user id i'm using and the home 
> directory is mounting, but i still got this error.
> 
> 	I upgraded to version 1.1.0p4 and i got the same error. I hope 
> 	someone can help me...
> 
> Thanks in advance,
> 
> Regards,
> 
> Dec  7 09:04:32 node002 pbs_mom: scan_for_exiting, cannot chdir to user 
> home directory
> Dec  7 09:04:32 node002 pbs_mom: scan_for_exiting, cannot chdir to user 
> home directory
> Dec  7 09:04:32 node002 pbs_mom: Unknown error 15023 (15023) in 
> job_start_error from node xx.xx.xx.xx:15003, 23.server

Recent versions of torque added some extra checks because previous versions
were just silently failing to chdir.  This has caused several people to
suddenly notice this problem.

I just checked the latest torque snapshot and I see some new code.  The new
behaviour is that it will log an error message if the chdir fails, and then
continue on in the current directory.

Torque peeps,
   Perhaps the correct behavior is to NOT chdir at that point.  System
prologue/epilogue can just be run from / or pbs's home.  And then run_pelog()
should chdir to the user's home dir after the setuid() call.

-- 
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20041208/66c9f79e/attachment.bin


More information about the torqueusers mailing list