[torqueusers] Spread config to submit host
fernando.campos at uam.es
Tue Apr 20 02:27:37 MDT 2010
I just realized now that when I submit jobs from the master node, they get
filtered to the proper nodes correctly but when I submit them from another
submit_host, they are queued and run, but don't care about the kind of node:
*41507-04/19/2010 18:16:04;0040;PBS_Server;Req;set_nodes;allocating nodes
for job 1209.master.node.com with node expression 'COREDUO'
not locate requested resources 'COREDUO' (node_spec failed) cannot allocate
node '06.**node.com**' to job - node not currently available (nps
needed/free: 1/0, joblist: 1029.**master.node.com**:0,1208.**
Obviously, master.node.com is a fake name, but the point is that when I try
to launch a job to the *short* queue, torque realizes that "needsnode" of
type COREDUO but there aren't any available so doesn't allocate any node and
I would like, as I said on my previous mail, that if every COREDUO nodes are
busy, then use the other type of nodes: XEON. But at least I can see the
queue is filtering the allocation of nodes depending on the type.
Any idea why this doesn't work submiting the jobs from the submit_host?????
Thanks a lot again!
2010/4/19 Fernando Campos <fernando.campos at uam.es>
> Hi all!!
> I'm having troubles configuring torque server. The situation is, let's say,
> 10 nodes running pbs_mom, 1 master node running pbs_server and pbs_sched
> (and NFS server and other stuffs), 1 submit host with torque-client
> installed to launch jobs and check the queues.
> The *nodes* file makes two sets of nodes depending on the type of
> processor: COREDUO and XEON.
> I've added the bold lines to my queues configuration so, executing *$ qmgr
> -c "p s"* on the master node running pbs_server I get:
> *# Create queues and set their attributes.*
> *# Create and define queue long*
> *create queue long*
> *set queue long queue_type = Execution*
> *set queue long resources_default.neednodes = XEON*
> *set queue long enabled = True*
> *set queue long started = True*
> *# Create and define queue short*
> *create queue short*
> *set queue short queue_type = Execution*
> *set queue short resources_max.cput = 24:00:00*
> *set queue short resources_max.walltime = 25:00:00*
> *set queue short resources_default.neednodes = COREDUO*
> *set queue short enabled = True*
> *set queue short started = True*
> So it's supposse that when I submit a job to the *short* queue should be
> executed on a COREDUO node, and if I submit a job to the *long* queue,
> execute on a XEON node. Obviously it's not working like that and I realize
> that when I execute *$ qmgr -c "p s"* from the submit machine I get
> different answer:
> # Create queues and set their attributes.
> # Create and define queue long
> create queue long
> set queue long queue_type = Execution
> set queue long enabled = True
> set queue long started = True
> # Create and define queue short
> create queue short
> set queue short queue_type = Execution
> set queue short resources_max.cput = 24:00:00
> set queue short resources_max.walltime = 25:00:00
> set queue short enabled = True
> set queue short started = True
> NO *set queue <queue> resources_default.neednodes = <NODE_GROUP> *LINES AT
> I've already checked and used the submit host to submit jobs to the master
> node and they are executed on the nodes. I have also checked nodes status
> with pbsnodes and everything seem work fine but this: they don't take care
> about "neednodes".
> Have anybody got any idea about why is this working this way???
> BTW, I also would like to send jobs on the short queue to XEON nodes if all
> the COREDUO are busy and send jobs on the long queue to COREDUO nodes if all
> the XEON are busy. Any hint??
> Thank you very much.
Fernando Campos Del Pozo
Departamento de Física Teórica
Facultad de Ciencias / Módulo 15 (C-XI) / Despacho 512
Universidad Autónoma de Madrid
e-mail: fernando.campos at uam.es
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the torqueusers