[torqueusers] maui diagnose reports no nodes - Scheduler does not start jobs

Anne Hammond hammond at txcorp.com
Mon Feb 5 23:48:31 MST 2007


torqueue 2.1.6
maui client version 3.2.6p18

The maui scheduler is not starting jobs.  They will start if you
do a "qrun nn", but otherwise they remain queued.

I've searched for the cause and found a couple of symptoms:

[hammond at storage3 bin]$  sudo ./diagnose -n 
--host=storage3.xx.xxxxxx.com 
-v
diagnosing node table (5120 slots)
Name                    State  Procs     Memory         Disk          Swap 
Speed  Opsys   Arch Par   Load Res Classes                        Network 
Features

-----                     ---   0:0        0:0           0:0           0:0

Total Nodes: 0  (Active: 0  Idle: 0  Down: 0)


--------------------------------
However, pbsnodes -a lists all nodes as free.

This also fails:

[hammond at storage3 bin]$ sudo ./checkjob 51.storage3.xx.xxxxxx.com
ERROR:    'checkjob' failed
ERROR:  cannot locate job '51.storage3.xx.xxxxxx.com'
-------------------------------
[root at storage3 sbin]# qstat -a

storage3.xx.xxxxxx.com:
                                                                    Req'd 
Req'd   Elap
Job ID               Username Queue    Jobname    SessID NDS   TSK Memory 
Time  S Time
-------------------- -------- -------- ---------- ------ ----- --- ------ 
----- - -----
51.storage3.xx.xxxxx swsides  s3opt12  a036193256  23470     5   2 
4000mb 
24:00 R 03:18
53.storage3.xx.xxxxx nzuonkwe s3opt12  STDIN         --      1   2  100mb 
24:00 Q   --

-------------------

Any pointers appreciated.

Thanks, Anne



More information about the torqueusers mailing list