[torqueusers] Possible bug with auto_node_np
J.A. Magallón
jamagallon at ono.com
Thu Dec 2 18:14:07 MST 2010
On Fri, 3 Dec 2010 02:04:35 +0100, "J.A. Magallón" <jamagallon at ono.com> wrote:
> Hi...
>
> Better the demo that any explanation:
>
> bran:~/mpi> qmgr -c 'p n n0.mpi' | grep "np ="
> set node n0.mpi np = 2
> bran:~/mpi> qmgr -c 'p n n1.mpi' | grep "np ="
> set node n1.mpi np = 2
> bran:~/mpi> qsub -l nodes=2:ppn=2 k
> qsub: Job exceeds queue resource limits MSG=cannot locate feasible nodes
> bran:~/mpi> qmgr -c 's n n0.mpi np = 2'
> bran:~/mpi> qmgr -c 'p n n0.mpi' | grep "np ="
> set node n0.mpi np = 2
> bran:~/mpi> qsub -l nodes=2:ppn=2 k
> 2.master.mpi
> bran:~/mpi> qstat -n
>
> annwn.cps.unizar.es:
> Req'd Req'd Elap
> Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
> -------------------- -------- -------- ---------------- ------ ----- --- ------ ----- - -----
> 2.master.mpi magallon std x -- 2 4 -- -- R --
> n1+n1+n0+n0
>
> First time submission fails, i re-set the same value (and only for one node)
> and then it works. Weird...
>
> Server was set with auto_node_np.
>
> Any ideas ?
>
Note: if I write manually 'np=2' on nodes file, or restart the server after
it has written the np= values itself, it works. So this hints that when
the server gets the values from the nodes, something is missing...
It should re-read it after auto detection ?
Hope this helps.
--
J.A. Magallon <jamagallon()ono!com> \ Software is like sex:
\ It's better when it's free
More information about the torqueusers
mailing list