[Mauiusers] jobs won't qdel

Robert Konecny rok@ucsd.edu
Mon, 15 Apr 2002 08:19:04 -0700


yeah, PBS sucks. The only solution for this (AFAIK anyway) is to manually
delete nonresponding jobs from /usr/spool/pbs/server_prov/jobs and restart
pbs-server.

robert

On Mon, Apr 15, 2002 at 10:11:17AM -0400, Michael R. Hanulec wrote:
> 
> i found some more evidence of these jobs:
> 
> [root@manager /root]# qstat
> Job id           Name             User             Time Use S Queue
> ---------------- ---------------- ---------------- -------- - -----
> 3487.manager     T0122-3          day              59:46:15 R dque            
> 3500.manager     STDIN            volker           00:00:03 E dque            
> 3503.manager     STDIN            volker                  0 E dque            
> 3504.manager     STDIN            volker                  0 E dque            
> [root@manager /root]# psh compute ps -auwwwx |grep volker
> node30: volker   31770  0.0  0.1  1924 1168 ?        D    Apr13   0:00
> /usr/local/pbs/sbin/pbs_mom -r
> node29: volker    7537  0.0  0.1  1928 1168 ?        D    Apr13   0:00
> /usr/local/pbs/sbin/pbs_mom -r
> node29: volker    7572  0.0  0.1  1928 1168 ?        D    Apr13   0:00
> /usr/local/pbs/sbin/pbs_mom -r
> [root@manager /root]# psh node29 kill -9 7537
> [root@manager /root]# psh compute ps -auwwwx |grep volker
> node30: volker   31770  0.0  0.1  1924 1168 ?        D    Apr13   0:00
> /usr/local/pbs/sbin/pbs_mom -r
> node29: volker    7537  0.0  0.1  1928 1168 ?        D    Apr13   0:00
> /usr/local/pbs/sbin/pbs_mom -r
> node29: volker    7572  0.0  0.1  1928 1168 ?        D    Apr13   0:00
> /usr/local/pbs/sbin/pbs_mom -r
> [root@manager /root]# psh node30 kill -9 31770
> [root@manager /root]# psh compute ps -auwwwx |grep volker
> node29: volker    7537  0.0  0.1  1928 1168 ?        D    Apr13   0:00
> /usr/local/pbs/sbin/pbs_mom -r
> node29: volker    7572  0.0  0.1  1928 1168 ?        D    Apr13   0:00
> /usr/local/pbs/sbin/pbs_mom -r
> node30: volker   31770  0.0  0.1  1924 1168 ?        D    Apr13   0:00
> /usr/local/pbs/sbin/pbs_mom -r
> [root@manager /root]# qstat
> Job id           Name             User             Time Use S Queue
> ---------------- ---------------- ---------------- -------- - -----
> 3487.manager     T0122-3          day              59:46:15 R dque            
> 3500.manager     STDIN            volker           00:00:03 E dque            
> 3503.manager     STDIN            volker                  0 E dque            
> 3504.manager     STDIN            volker                  0 E dque            
> [root@manager /root]# 
> 
> this actually looks more like a pbs thing, so i'll ask my questions over
> there.. but if anyone has an answer please reply.
> 
> thanks.
> 
> -mike
> 
> --
> mike hanulec			       email: hanulec@schrodinger.com
> system manager, nyc			    office: 646.366.9555 x125
> schrodinger, inc.				   cell: 516.410.4478
> 
> On Sun, 14 Apr 2002, Michael R. Hanulec wrote:
> 
> > 
> > that didn't worked either:
> > 
> > [root@manager /root]# qsig -s 9 3500
> > qsig: Request invalid for state of job 3500.manager.schrodinger.com
> > [root@manager /root]# qsig -s 9 3503
> > qsig: Request invalid for state of job 3503.manager.schrodinger.com
> > [root@manager /root]# qsig -s 9 3504
> > qsig: Request invalid for state of job 3504.manager.schrodinger.com
> > [root@manager /root]# qstat
> > Job id           Name             User             Time Use S Queue
> > ---------------- ---------------- ---------------- -------- - -----
> > 3486.manager     T0122-2          day              44:42:51 R dque
> > 3487.manager     T0122-3          day              44:42:00 R dque
> > 3500.manager     STDIN            volker           00:00:03 E dque
> > 3503.manager     STDIN            volker                  0 E dque
> > 3504.manager     STDIN            volker                  0 E dque
> > [root@manager /root]# su - volker
> > [manager] ~> qsig -s 9 3500
> > qsig: Request invalid for state of job 3500.manager.schrodinger.com
> > [manager] ~> qstat
> > Job id           Name             User             Time Use S Queue
> > ---------------- ---------------- ---------------- -------- - -----
> > 3486.manager     T0122-2          day              44:42:51 R dque
> > 3487.manager     T0122-3          day              44:42:00 R dque
> > 3500.manager     STDIN            volker           00:00:03 E dque
> > 3503.manager     STDIN            volker                  0 E dque
> > 3504.manager     STDIN            volker                  0 E dque
> > [manager] ~> exit
> > logout
> > [root@manager /root]# 
> > 
> > i tried it under both the root user and the user running the job(s).  i
> > believe the qsig man page said this would only work under running jobs.
> > 
> > any more suggestions?
> > 
> > -mike
> > 
> > --
> > mike hanulec			       email: hanulec@schrodinger.com
> > system manager, nyc			    office: 646.366.9555 x125
> > schrodinger, inc.				   cell: 516.410.4478
> > 
> > On Sun, 14 Apr 2002, Dr. Jason Hogan-O'Neill wrote:
> > 
> > > 
> > > Try 'qsig' . There shoud be a man page. I had a similar probelm and 
> > > qsig did it for me. 
> > > 
> > > qsig -s 9 idnumber
> > > 
> > > 
> > > On Sun, 14 Apr 2002 07:29:59 -0400 (EDT) "Michael R. Hanulec" 
> > > <hanulec@schrodinger.com> wrote:
> > > 
> > > > Hello all..
> > > > 
> > > > Could someone give me some pointers on killing some jobs in a queue which
> > > > won't die.  The last three jobs below by the user 'volker' have been in
> > > > this state for over 12 hours.  These jobs were supposed to be 2 min tests
> > > > of our new cluster writting some data to an NFS partition.
> > > > 
> > > > 
> > > > [root@manager /root]# qstat
> > > > Job id           Name             User             Time Use S Queue
> > > > ---------------- ---------------- ---------------- -------- - -----
> > > > 3486.manager     T0122-2          day              33:07:08 R dque            
> > > > 3487.manager     T0122-3          day              33:07:14 R dque            
> > > > 3500.manager     STDIN            volker           00:00:03 E dque            
> > > > 3503.manager     STDIN            volker                  0 E dque            
> > > > 3504.manager     STDIN            volker                  0 E dque            
> > > > [root@manager /root]# qdel 3500
> > > > qdel: Request invalid for state of job 3500.manager.schrodinger.com
> > > > [root@manager /root]# qdel 3503
> > > > qdel: Request invalid for state of job 3503.manager.schrodinger.com
> > > > [root@manager /root]# qdel 3504
> > > > qdel: Request invalid for state of job 3504.manager.schrodinger.com
> > > > [root@manager /root]# which qdel
> > > > /usr/local/pbs/bin/qdel
> > > > [root@manager /root]# 
> > > > 
> > > > 
> > > > If anyone has any suggestions I would really like to hear them.
> > > > 
> > > > Thanks in advance!
> > > > 
> > > > -Mike
> > > > 
> > > > --
> > > > mike hanulec			       email: hanulec@schrodinger.com
> > > > system manager, nyc			    office: 646.366.9555 x125
> > > > schrodinger, inc.				   cell: 516.410.4478
> > > > 
> > > > 
> > > > _______________________________________________
> > > > mauiusers mailing list
> > > > mauiusers@supercluster.org
> > > > http://supercluster.org/mailman/listinfo/mauiusers
> > > 
> > > 
> > 
> > 
> > 
> 
> _______________________________________________
> mauiusers mailing list
> mauiusers@supercluster.org
> http://supercluster.org/mailman/listinfo/mauiusers