[torqueusers] Multiple mom processes, owned by regular user (Torque 4.2.5, Maui 3.3.1)

Dave Ulrick d-ulrick at comcast.net
Mon Nov 11 09:13:29 MST 2013


On Fri, 8 Nov 2013, Gus Correa wrote:

> Some nodes seem to "develop" or create *multiple pbs_mom processes*
> after they run for a while.
> Those are nodes that are used more heavily
> (higher node numbers).
> See the printout below, specially nodes 30-32.
>
> The duplicate pbs_mom processes are owned by a regular user, which is
> rather awkward.
>
> As a result, multi-node jobs that get the affected nodes, fail to run.
> I enclose below a tracejob sample of one of those jobs.

I had a rash of nodes with multiple MOMs awhile back. The problem began 
when I'd unwittingly deployed different TORQUE releases on the server node 
(4.2.4.1) and the compute nodes (4.2.3.1). The problem seemed to go away 
when I deployed the same TORQUE release on all nodes. Now that I'm on 
4.2.5 on all nodes I'm no longer seeing this issue.

Dave
-- 
Dave Ulrick
d-ulrick at comcast.net


More information about the torqueusers mailing list