[torqueusers] maui diagnose reports no nodes - Scheduler does not
start jobs
Anne Hammond
hammond at txcorp.com
Mon Feb 5 23:48:31 MST 2007
torqueue 2.1.6
maui client version 3.2.6p18
The maui scheduler is not starting jobs. They will start if you
do a "qrun nn", but otherwise they remain queued.
I've searched for the cause and found a couple of symptoms:
[hammond at storage3 bin]$ sudo ./diagnose -n
--host=storage3.xx.xxxxxx.com
-v
diagnosing node table (5120 slots)
Name State Procs Memory Disk Swap
Speed Opsys Arch Par Load Res Classes Network
Features
----- --- 0:0 0:0 0:0 0:0
Total Nodes: 0 (Active: 0 Idle: 0 Down: 0)
--------------------------------
However, pbsnodes -a lists all nodes as free.
This also fails:
[hammond at storage3 bin]$ sudo ./checkjob 51.storage3.xx.xxxxxx.com
ERROR: 'checkjob' failed
ERROR: cannot locate job '51.storage3.xx.xxxxxx.com'
-------------------------------
[root at storage3 sbin]# qstat -a
storage3.xx.xxxxxx.com:
Req'd
Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory
Time S Time
-------------------- -------- -------- ---------- ------ ----- --- ------
----- - -----
51.storage3.xx.xxxxx swsides s3opt12 a036193256 23470 5 2
4000mb
24:00 R 03:18
53.storage3.xx.xxxxx nzuonkwe s3opt12 STDIN -- 1 2 100mb
24:00 Q --
-------------------
Any pointers appreciated.
Thanks, Anne
More information about the torqueusers
mailing list