[torqueusers] Post job file processing bug in 1.1.0 (patch 1-3)
Roy.Dragseth at cc.uit.no
Fri Nov 12 14:18:28 MST 2004
On Thursday 21 October 2004 19:04, Dave Jackson wrote:
> The reply code 15035 inidicates that an invalid home directory was
> specified when the mom was attempting to fork to the user. Torque
> 1.1.0p3 and higher fixed a bug where the mom would attempt to determine
> the home directory based on a NULL job. This is the only change which
> would have affected it. It appears that this bug masked another.
> The latest torque-1.1.0p4 snapshot contains a re-organized mom-level
> fork routine which will log environment errors better and will also
> sanity check the home directory. If you can regularly reproduce this
> failure, please test with the 1.1.0p4 snapshot and send us the logs.
> You should only need to upgrade the moms on the nodes where the test job
> is being run. For maximum value, please export the env variable
> PBSLOGLEVEL=3 on the compute nodes before starting the mom.
> With this info, we should be able to rectify this problem quickly.
Recent postings to this list have indicated that the problem is related to the
permissions set on the home catalog and indeed this seems to be the key to
the problems I see with the Rocks setup. If I change the permission on the
home catalog from 700 to 755 the stdout and stderr files arrives correctly.
Also, setting the no_root_squash for the exported home tree seems to do the
trick too. As earlier noted, this change is not necessary when using torque
1.0.1, I would very much like this to be fixed so I can update the pbs/maui
roll for Rocks with the latest and greatest torque version.
The Computer Center, University of Tromsø, N-9037 TROMSØ, Norway.
phone:+47 77 64 41 07, fax:+47 77 64 41 00
Roy Dragseth, High Performance Computing System Administrator
Direct call: +47 77 64 62 56. email: royd at cc.uit.no
More information about the torqueusers