[torqueusers] Re: Processes die without any CPU time

Alberto Simões hashashin at gmail.com
Fri Jun 1 11:31:46 MDT 2007


Well, after some debug by the admins, we found out that one node was
not mounting home directories, and thus, all processes on that node
failed silently.

Cheers
Alberto

On 6/1/07, Alberto Simões <hashashin at gmail.com> wrote:
> Hi
>
> I have a process that submits (qsub) 150 smaller processes. From these
> processes, 98% of them finished without any CPU time. For instance:
>
> [ambs at search ~]$ tracejob 120737
> /opt/torque/server_priv/accounting/20070601: Permission denied
> /opt/torque/mom_logs/20070601: No such file or directory
> /opt/torque/sched_logs/20070601: No such file or directory
>
> Job: 120737.search.di.uminho.pt
>
> 06/01/2007 14:32:47  S    enqueuing into default, state 1 hop 1
> 06/01/2007 14:32:47  S    dequeuing from default, state QUEUED
> 06/01/2007 14:32:47  S    enqueuing into tcurtos, state 1 hop 1
> 06/01/2007 14:32:47  S    Job Queued at request of ambs at search.di.uminho.pt,
>                           owner = ambs at search.di.uminho.pt, job name =
>                           ambs#initmat108.sh, queue = tcurtos
> 06/01/2007 14:32:47  S    Job Modified at request of maui at search.di.uminho.pt
> 06/01/2007 14:32:47  S    Job Run at request of maui at search.di.uminho.pt
> 06/01/2007 14:32:47  S    Job Modified at request of maui at search.di.uminho.pt
> 06/01/2007 14:32:47  S    Exit_status=-2 resources_used.cput=00:00:00
>                           resources_used.mem=0kb resources_used.vmem=0kb
>                           resources_used.walltime=00:00:00
> 06/01/2007 14:32:47  S    Post job file processing error
> 06/01/2007 14:32:47  S    dequeuing from tcurtos, state COMPLETE
>
>
>
> Any hints on what might be the problem?
> Thank you in advance,
> Alberto
>
> --
> Alberto Simões
>


-- 
Alberto Simões


More information about the torqueusers mailing list