[torqueusers] qdel will not delete
Garrick Staples
garrick at usc.edu
Thu Dec 11 12:07:59 MST 2008
Sounds like an old bug. Are you running the latest version in your branch?
On Thu, Dec 11, 2008 at 12:47:21PM -0600, Rahul Nabar alleged:
> I've had jobs that won't respond to qdel once every so often. Their
> "REMAINING-time" on MAUI then becomes negative which was initially
> confusing since I thought it was a MAUI bug.
>
> But the root-cause seems to be that PBS will not obey the qdel on this
> job. Irrespective of whether I issue it as root or MAUI issues it.
>
> I had one such job today and I debugged it more: All the sub-nodes
> seemed to be up. the mom daemon on each one of these nodes seemed to
> be up and running.
>
> The mom_log on the master node though was interesting; It had this snippet:
>
> 12/11/2008 11:47:38;0002; pbs_mom;Svr;im_request;connect from 11.0.1.79:1023
> 12/11/2008 11:47:38;0008;
> pbs_mom;Job;233139.supernova.che.wisc.edu;received request 'KILL_JOB'
> from 11.0.1.79:1023
> 12/11/2008 11:47:38;0008;
> pbs_mom;Job;233139.supernova.che.wisc.edu;ERROR: received request
> 'KILL_JOB' from 11.0.1.79:1023 for job '233139.supernova.che.wisc.edu'
> (job does not exist locally)
>
> The only way I could get this job to delete was to restart the pbs_mom
> on that node.
>
> Anyone else who has encountered these symptoms? For me the first clue
> was a negative "REMAINING-time" on MAUI and users who complained that
> they could not qdel a job. In the past I've achieved the same effect
> by removing the relevant foo.supe.JB and foo.supe.SC files from the
> /var/spool/torque/server_priv/jobs on the master node.
> But I don't think that is the best way out. I'd appreciate any other
> debug suggestions as well.
>
> --
> Rahul
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
--
Garrick Staples, GNU/Linux HPCC SysAdmin
University of Southern California
See the Dishonor Roll at http://www.californiansagainsthate.com/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20081211/211c54c3/attachment.bin
More information about the torqueusers
mailing list