[torqueusers] Re: Problem starting jobs with multiple nodes. SOLVED

John Hanks griznog at gmail.com
Wed Oct 8 11:03:23 MDT 2008


On Wed, Oct 8, 2008 at 10:29 AM, John Hanks <griznog at gmail.com> wrote:
> Hi,
>
> I'm setting up a small cluster and have hit a snag with torque. I can
> submit and run jobs that use a single node without any problems, but
> jobs that use more than one node bounce back and forth between Q and R
> states and never start. The erros I see are:
>
> Oct  8 10:22:29 node-0012 pbs_mom: Success (0) in init_groups, pre-sigprocmask
> Oct  8 10:22:29 node-0012 pbs_mom: Success (0) in init_groups, post-initgroups
> Oct  8 10:22:29 node-0012 pbs_mom: Bad UID for job execution (15023)

These messages were the (obscure to me) clue. The user submitting the
job had a primary group the nodes couldn't resolve. No idea why it
didn't impact single node jobs, but adding that group to /etc/groups
on teh nodes solved the problem.

jbh


More information about the torqueusers mailing list