[torqueusers] Possible bug with auto_node_np

J.A. Magallón jamagallon at ono.com
Thu Dec 2 18:14:07 MST 2010


On Fri, 3 Dec 2010 02:04:35 +0100, "J.A. Magallón" <jamagallon at ono.com> wrote:

> Hi...
> 
> Better the demo that any explanation:
> 
> bran:~/mpi> qmgr -c 'p n n0.mpi' | grep "np ="
> set node n0.mpi np = 2
> bran:~/mpi> qmgr -c 'p n n1.mpi' | grep "np ="
> set node n1.mpi np = 2
> bran:~/mpi> qsub -l nodes=2:ppn=2 k
> qsub: Job exceeds queue resource limits MSG=cannot locate feasible nodes
> bran:~/mpi> qmgr -c 's n n0.mpi np = 2'
> bran:~/mpi> qmgr -c 'p n n0.mpi' | grep "np ="
> set node n0.mpi np = 2
> bran:~/mpi> qsub -l nodes=2:ppn=2 k
> 2.master.mpi
> bran:~/mpi> qstat -n
> 
> annwn.cps.unizar.es:
>                                                                          Req'd  Req'd   Elap
> Job ID               Username Queue    Jobname          SessID NDS   TSK Memory Time  S Time
> -------------------- -------- -------- ---------------- ------ ----- --- ------ ----- - -----
> 2.master.mpi         magallon std      x                   --      2   4    --    --  R   --
>    n1+n1+n0+n0
> 
> First time submission fails, i re-set the same value (and only for one node)
> and then it works. Weird...
> 
> Server was set with auto_node_np.
> 
> Any ideas ?
> 

Note: if I write manually 'np=2' on nodes file, or restart the server after
it has written the np= values itself, it works. So this hints that when
the server gets the values from the nodes, something is missing...
It should re-read it after auto detection ?

Hope this helps.

-- 
J.A. Magallon <jamagallon()ono!com>     \               Software is like sex:
                                         \         It's better when it's free


More information about the torqueusers mailing list