[torqueusers] qdel will not delete

Garrick Staples garrick at usc.edu
Thu Dec 11 12:07:59 MST 2008


Sounds like an old bug.  Are you running the latest version in your branch?

On Thu, Dec 11, 2008 at 12:47:21PM -0600, Rahul Nabar alleged:
> I've had jobs that won't respond to qdel once every so often. Their
> "REMAINING-time" on MAUI then becomes negative which was initially
> confusing since I thought it was a MAUI bug.
> 
> But the root-cause seems to be that PBS will not obey the qdel on this
> job. Irrespective of whether I issue it as root or MAUI issues it.
> 
> I had one such job today and I debugged it more:  All the sub-nodes
> seemed to be up. the mom daemon on each one of these nodes seemed to
> be up and running.
> 
> The mom_log on the master node though was interesting; It had this snippet:
> 
> 12/11/2008 11:47:38;0002;   pbs_mom;Svr;im_request;connect from 11.0.1.79:1023
> 12/11/2008 11:47:38;0008;
> pbs_mom;Job;233139.supernova.che.wisc.edu;received request 'KILL_JOB'
> from 11.0.1.79:1023
> 12/11/2008 11:47:38;0008;
> pbs_mom;Job;233139.supernova.che.wisc.edu;ERROR:    received request
> 'KILL_JOB' from 11.0.1.79:1023 for job '233139.supernova.che.wisc.edu'
> (job does not exist locally)
> 
> The only way I could get this job to delete was to restart the pbs_mom
> on that node.
> 
> Anyone else who has encountered these symptoms? For me the first clue
> was a negative "REMAINING-time" on MAUI and users who complained that
> they could not qdel a job. In the past I've achieved the same effect
> by removing the relevant foo.supe.JB  and foo.supe.SC  files from the
> /var/spool/torque/server_priv/jobs on the master node.
> But I don't think that is the best way out. I'd appreciate any other
> debug suggestions as well.
> 
> -- 
> Rahul
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers

-- 
Garrick Staples, GNU/Linux HPCC SysAdmin
University of Southern California

See the Dishonor Roll at http://www.californiansagainsthate.com/

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20081211/211c54c3/attachment.bin


More information about the torqueusers mailing list