[torqueusers] Job dies randomly, but only through torque
Jim Kusznir
jkusznir at gmail.com
Tue May 27 17:02:17 MDT 2008
Yep. Wall time is no where near violation (dies about 2 minutes into
a 30 minute allocation). I did a ulimit -a through qsub and direct on
the node (as the same user in both cases), and the results were
identical (most items were unlimited).
Any other ideas?
--Jim
On Tue, May 27, 2008 at 9:25 AM, Jan Ploski <Jan.Ploski at offis.de> wrote:
>
> This suggestion is rather trivial, but since you have not mentioned
> anything in this area:
>
> Are you sure that the job is not exceeding resource limits (walltime -
> enforced by TORQUE, or rlimits such as memory - enforced by the kernel,
> but they could be set differently in TORQUE and your manual invocations of
> mpirun).
>
> Regards,
> Jan Ploski
>
More information about the torqueusers
mailing list