[torqueusers] Re: Job remains in state R
Adil Mughal
adil.m.mughal at gmail.com
Mon Feb 25 06:35:06 MST 2008
I had a closer look at my mom_log file on one of the slaves and there
is the following repeated error message:
pbs_mom;Req;jobobit;No contact with server at hostaddr 907c3092, port
15001, jobid 165.dphpc1011.dph.$
$1.dph.aber.ac.uk errno 113
Does that help?
Adil
On Mon, Feb 25, 2008 at 1:17 PM, Adil Mughal <adil.m.mughal at gmail.com> wrote:
> Dear Experts
>
> I recently had to reboot my master computer.
>
> After rebooting I went through the usual steps to set up - i.e.
>
> >qterm
> > pbs_server
> >pbs_sched
>
> The problem is that now when I submit a basic job like:
>
> echo "sleep 5" | qsub
>
> or
>
> echo "touch testfile" | qsub
>
> the job remains in the run state, that is typing qstat gives something
> like this:
>
> Job id Name User Time Use S Queue
> ------------------- ---------------- --------------- -------- - -----
> 165.dphpc1011 STDIN guest1 0 R batch
> 166.dphpc1011 STDIN guest1 00:00:00 R batch
> 167.dphpc1011 STDIN guest1 0 R batch
> 168.dphpc1011 STDIN guest1 00:00:00 R batch
>
> Wheras prevously the jobs were running and then dequeuing
>
> Any ideas what I might have missed
>
> adil
>
More information about the torqueusers
mailing list