[torqueusers] maui diagnose reports no nodes - Scheduler does not start jobs

Paul Gray gray at cs.uni.edu
Tue Feb 6 06:34:14 MST 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Mon, Feb 05, 2007 at 11:48:31PM -0700, Anne Hammond wrote:
> torqueue 2.1.6
> maui client version 3.2.6p18
> 
> The maui scheduler is not starting jobs.  They will start if you
> do a "qrun nn", but otherwise they remain queued.
> 
> I've searched for the cause and found a couple of symptoms:
> 
> [hammond at storage3 bin]$  sudo ./diagnose -n 
> --host=storage3.xx.xxxxxx.com 
> -v
> diagnosing node table (5120 slots)
> Name                    State  Procs     Memory         Disk          Swap 
> Speed  Opsys   Arch Par   Load Res Classes                        Network 
> Features
> 
> -----                     ---   0:0        0:0           0:0           0:0
> 
> Total Nodes: 0  (Active: 0  Idle: 0  Down: 0)
> 
> 
> --------------------------------
> However, pbsnodes -a lists all nodes as free.
> 
> This also fails:
> 
> [hammond at storage3 bin]$ sudo ./checkjob 51.storage3.xx.xxxxxx.com
> ERROR:    'checkjob' failed
> ERROR:  cannot locate job '51.storage3.xx.xxxxxx.com'
> -------------------------------

These symptoms are similar to those that I have when configuring Maui on
Debian boxes.  Maui starts, torque is going strong, but the two just don't want to
communicate.  Your issue might be caused by the same maui.cfg configuration 
recently discussed on the mauiusers list here:
   http://www.supercluster.org/pipermail/mauiusers/2007-February/thread.html

See if Lawrence's suggestion on tweaking the Resource Manager's Definition
and restarting maui helps to address the issue.  

- -- 
Paul Gray                                         -o)
314 East Gym, Dept. of Computer Science           /\\
University of Northern Iowa                      _\_V
Message void if penguin violated ...  Don't mess with the penguin
No one says, "Hey, I can't read that ASCII attachment ya sent me."
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFFyIPWOH45TZW7mh4RAnUaAJwPQa39Or/ns2demmi5tGF34KwxZgCg7TCR
5TKNheBj/1e69atXuav9rrE=
=Lc2O
-----END PGP SIGNATURE-----


More information about the torqueusers mailing list