[torqueusers] Re: Current Status showing INFINITY and job not deleting

Vadivelan Ranjith velan.aero at gmail.com
Sat Apr 21 08:55:37 MDT 2007


Hi
I cleared the job using momctl -c  12361, it gave
>>job clear request successful on localhost
But job not deleted.
then I got the following message when i ran momctl -h

root at galaxy:~# momctl -h node07 -d 1

Host: node07.cluster2.iitb.ac.in/node07.cluster2.iitb.ac.in   Version:
2.1.0p0
Server[0]: 192.168.1.1 (connection is active)
  Init Msgs Received:     0 hellos/1 cluster-addrs
  Init Msgs Sent:         1 hellos
  Last Msg From Server:   1 seconds (StatusJob)
  Last Msg To Server:     30 seconds
HomeDirectory:          /usr/spool/PBS/mom_priv
MOM active:             360269 seconds
Server Update Interval: 45 seconds
LOGLEVEL:               0 (use SIGUSR1/SIGUSR2 to adjust)
Communication Model:    RPP
TCP Timeout:            20 seconds
NOTE:  no prolog configured
Trusted Client List:    192.168.1.106,192.168.1.105,192.168.1.104,
192.168.1.103,192.168.1.102,192.168.1.101,192.168.1.1,192.168.1.116,
192.168.1.115,192.168.1.114,192.168.1.113,192.168.1.112,192.168.1.111,
192.168.1.110,192.168.1.109,192.168.1.108,192.168.1.107,127.0.0.1
Configured to use /usr/bin/scp
job[12361.galaxy.aero.iitb.ac.in]  state=EXITING  sidlist=2820
job[12851.galaxy.aero.iitb.ac.in]  state=RUNNING  sidlist=5667
Assigned CPU Count:     2

diagnostics complete

Velan


On 4/21/07, Chris Samuel <csamuel at vpac.org> wrote:
>
> On Sat, 21 Apr 2007, Vadivelan Ranjith wrote:
>
> > Hi
> > Still problem is not solved. Jobs are not deleting.
>
> Did the momctl command to clear that job say anything when you ran it ?
>
> With both yourself and Adam having the same problem with jobs not getting
> deleted after a node reboot it's looking like it could possibly be a
> Torque
> bug.  :-(
>
> Out of interest, on your compute nodes is SE Linux turned on ?
>
> >  But really job 12361 is not running. Our compute nodes are dual
> > processors, but currently only one processor is running because of this
> > problem.
>
> What does "momctl -h node07 -d 1" say ?
>
> cheers,
> Chris
> --
> Christopher Samuel - (03)9925 4751 - VPAC Deputy Systems Manager
> Victorian Partnership for Advanced Computing http://www.vpac.org/
> Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20070421/cdc93cd1/attachment.html


More information about the torqueusers mailing list