[torqueusers] Server crashes because of a wrong group

Garrick Staples garrick at usc.edu
Wed Nov 21 22:12:47 MST 2007


On Fri, Nov 09, 2007 at 07:27:14PM +0100, Danny Sternkopf alleged:
> Hi,
> 
> we experience a PBS server crash when a user specifies a wrong user
> group during job submission.
> 
> Example: qsub -lnodes=1,walltime=00:01:00 -W group_list=asl
> 
> The user is not in asl group and gets:
> 
> $ qsub -lnodes=1,walltime=00:01:00 -W group_list=asl
> echo blubb
> qsub: End of File
> $ echo $?
> 183
> 
> The pbs_server crashes right after it. Then I can't start the pbs_server
> anymore. I have to remove the created job files
> /var/spool/pbs/server_priv/jobs/*.SC and *.JB first.
> 
> That also happens if the group doesn't exist.
> 
> I've been aware of that since a couple of years. But I never really
> followed up it because it happened very seldom.
> 
> The affected platform is EM64T(or x86_64) running with RHEL4 (Scientific
> Linux 4.1).
> 
> I did some tests on other system:
> 
> 1. On ia64 running with RHEL3(Whitebox Linux)
> - The job is running fine. And in the accounting you can see 'group=<null>'.
> 
> 2. On x86_64 running with RHEL3 (Fedora 3)
> - The jobs is 'rejected by all possible destination'.
> 
> On all three system we have Torque version 2.1.6 running.
> 
> The different behaviors are very strange. Might be that configuration
> plays also a role.
> 
> Any ideas what could go wrong? Is there a known issue with that?

I just tried this on 2 different x86_64 Centos4 boxes to execution and routing
queues and couldn't reproduce a crash.  The boxes had 2.1.9 and current trunk.
Either the bug is sensitive to some other criteria, or the bug is already fixed.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20071121/ce7fe701/attachment.bin


More information about the torqueusers mailing list