[torqueusers] Re: Processes die without any CPU time
Alberto Simões
hashashin at gmail.com
Fri Jun 1 11:31:46 MDT 2007
Well, after some debug by the admins, we found out that one node was
not mounting home directories, and thus, all processes on that node
failed silently.
Cheers
Alberto
On 6/1/07, Alberto Simões <hashashin at gmail.com> wrote:
> Hi
>
> I have a process that submits (qsub) 150 smaller processes. From these
> processes, 98% of them finished without any CPU time. For instance:
>
> [ambs at search ~]$ tracejob 120737
> /opt/torque/server_priv/accounting/20070601: Permission denied
> /opt/torque/mom_logs/20070601: No such file or directory
> /opt/torque/sched_logs/20070601: No such file or directory
>
> Job: 120737.search.di.uminho.pt
>
> 06/01/2007 14:32:47 S enqueuing into default, state 1 hop 1
> 06/01/2007 14:32:47 S dequeuing from default, state QUEUED
> 06/01/2007 14:32:47 S enqueuing into tcurtos, state 1 hop 1
> 06/01/2007 14:32:47 S Job Queued at request of ambs at search.di.uminho.pt,
> owner = ambs at search.di.uminho.pt, job name =
> ambs#initmat108.sh, queue = tcurtos
> 06/01/2007 14:32:47 S Job Modified at request of maui at search.di.uminho.pt
> 06/01/2007 14:32:47 S Job Run at request of maui at search.di.uminho.pt
> 06/01/2007 14:32:47 S Job Modified at request of maui at search.di.uminho.pt
> 06/01/2007 14:32:47 S Exit_status=-2 resources_used.cput=00:00:00
> resources_used.mem=0kb resources_used.vmem=0kb
> resources_used.walltime=00:00:00
> 06/01/2007 14:32:47 S Post job file processing error
> 06/01/2007 14:32:47 S dequeuing from tcurtos, state COMPLETE
>
>
>
> Any hints on what might be the problem?
> Thank you in advance,
> Alberto
>
> --
> Alberto Simões
>
--
Alberto Simões
More information about the torqueusers
mailing list