[torqueusers] Server crashes because of a wrong group
Danny Sternkopf
dsternkopf at hpce.nec.com
Fri Nov 9 11:57:12 MST 2007
Hi,
let me add one more thing.
It seems to be dependent on routing queues.
If the job is submitted to a routing queue then server crashes.
If the jobs is submitted directly to an execution queue then it works:
$qsub -lnodes=1,walltime=00:01:00 -W group_list=asl -q workq
echo blubb
qsub: Bad GID for job execution
Best regards,
Danny
Danny Sternkopf wrote:
> Hi,
>
> we experience a PBS server crash when a user specifies a wrong user
> group during job submission.
>
> Example: qsub -lnodes=1,walltime=00:01:00 -W group_list=asl
>
> The user is not in asl group and gets:
>
> $ qsub -lnodes=1,walltime=00:01:00 -W group_list=asl
> echo blubb
> qsub: End of File
> $ echo $?
> 183
>
> The pbs_server crashes right after it. Then I can't start the pbs_server
> anymore. I have to remove the created job files
> /var/spool/pbs/server_priv/jobs/*.SC and *.JB first.
>
> That also happens if the group doesn't exist.
>
> I've been aware of that since a couple of years. But I never really
> followed up it because it happened very seldom.
>
> The affected platform is EM64T(or x86_64) running with RHEL4 (Scientific
> Linux 4.1).
>
> I did some tests on other system:
>
> 1. On ia64 running with RHEL3(Whitebox Linux)
> - The job is running fine. And in the accounting you can see 'group=<null>'.
>
> 2. On x86_64 running with RHEL3 (Fedora 3)
> - The jobs is 'rejected by all possible destination'.
>
> On all three system we have Torque version 2.1.6 running.
>
> The different behaviors are very strange. Might be that configuration
> plays also a role.
>
> Any ideas what could go wrong? Is there a known issue with that?
>
>
> Best regards,
>
> Danny
--
Danny Sternkopf dsternkopf at hpce.nec.com
NEC HPC Europe GmbH, http://www.teraflop-workbench.de
Stuttgart, Germany phone: +49-711-68770-35 fax: +49-711-6877145
More information about the torqueusers
mailing list