[torqueusers] Job dies randomly, but only through torque

Jim Kusznir jkusznir at gmail.com
Tue May 27 17:02:17 MDT 2008


Yep.  Wall time is no where near violation (dies about 2 minutes into
a 30 minute allocation).  I did a ulimit -a through qsub and direct on
the node (as the same user in both cases), and the results were
identical (most items were unlimited).

Any other ideas?

--Jim

On Tue, May 27, 2008 at 9:25 AM, Jan Ploski <Jan.Ploski at offis.de> wrote:
>
> This suggestion is rather trivial, but since you have not mentioned
> anything in this area:
>
> Are you sure that the job is not exceeding resource limits (walltime -
> enforced by TORQUE, or rlimits such as memory - enforced by the kernel,
> but they could be set differently in TORQUE and your manual invocations of
> mpirun).
>
> Regards,
> Jan Ploski
>


More information about the torqueusers mailing list