[torqueusers] Job dies randomly, but only through torque

Jim Kusznir jkusznir at gmail.com
Tue May 27 17:02:17 MDT 2008

Yep.  Wall time is no where near violation (dies about 2 minutes into
a 30 minute allocation).  I did a ulimit -a through qsub and direct on
the node (as the same user in both cases), and the results were
identical (most items were unlimited).

Any other ideas?


On Tue, May 27, 2008 at 9:25 AM, Jan Ploski <Jan.Ploski at offis.de> wrote:
> This suggestion is rather trivial, but since you have not mentioned
> anything in this area:
> Are you sure that the job is not exceeding resource limits (walltime -
> enforced by TORQUE, or rlimits such as memory - enforced by the kernel,
> but they could be set differently in TORQUE and your manual invocations of
> mpirun).
> Regards,
> Jan Ploski

More information about the torqueusers mailing list