[torqueusers] queue to node mapping is wrong when use '-l procs' option

Sreedhar Manchu sm4082 at nyu.edu
Tue Feb 7 06:40:45 MST 2012


Hi,

Instead of using

> set queue fluent acl_host_enable = False
> set queue fluent acl_hosts = cnode01

I set a feature to the node I wanted my jobs to run or wanted it to be under a special queue, I gave a certain feature to the nodes and put it in the pbs script like this:

#PBS -l feature=<feature name>

Moab can put the jobs on the nodes with those features. I'm not sure how maui does it. I have a qsub wrapper that adds this feature line depending on users' requests.

To give features to nodes, I used 

qmgr -c 'set node <node name> properties += <feature name>'

For example, our p48 nodes have features like chassis0, chassis1, etc to indicate the chassis they belong to. Since we are asking for a specific queue with specific features, jobs always go onto right nodes with right feature.

Sreedhar.

On Feb 7, 2012, at 4:33 AM, Xiangqian Wang wrote:

> I failed to test queue to node mapping feature of torque/maui system, I use torque 2.5.8 and maui 3.2.6p21. the simple job script contains a procs option: 
> 
> #!/bin/sh
> #PBS -N simple-job
> #PBS -l procs=3
> #PBS -q fluent
> #PBS -d /opt/share/job
> cd $PBS_O_WORKDIR
> date
> sleep 30
> date
> 
> The 'fluent' queue is mapped to a node 'cnode01' with 4 processors, the setting is shown below:
> 
> # Create and define queue batch
> #
> create queue batch
> set queue batch queue_type = Execution
> set queue batch resources_default.nodes = 1
> set queue batch resources_default.walltime = 01:00:00
> set queue batch enabled = True
> set queue batch started = True
> #
> # Create and define queue fluent
> #
> create queue fluent
> set queue fluent queue_type = Execution
> set queue fluent acl_host_enable = False
> set queue fluent acl_hosts = cnode01
> set queue fluent enabled = True
> set queue fluent started = True
> #
> # Set server attributes.
> #
> set server scheduling = True
> set server acl_hosts = snode01
> set server acl_roots = root@*
> set server managers = root at snode01
> set server operators = root at snode01
> set server default_queue = batch
> set server log_events = 511
> set server mail_from = adm
> set server scheduler_iteration = 600
> set server node_check_rate = 150
> set server tcp_timeout = 6
> set server mom_job_sync = True
> set server keep_completed = 300
> set server auto_node_np = True
> set server next_job_number = 94
> set server display_job_server_suffix = False
> 
> The job should use a single node 'cnode01' , while the allocated node contains another node. see part of 'qstat -f' output:
> 
>     exec_host = snode01/1+snode01/0+cnode01/0
>     ...
>     Resource_List.neednodes = cnode01
>     Resource_List.procs = 3
> 
> Can anyone give me some suggestion, it'll be greatly appreciated.
> 
> Xiangqian
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers

---
Sreedhar Manchu
HPC Support Specialist
New York University
251 Mercer Street
New York, NY 10012-1110


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20120207/56ef80c5/attachment.html 


More information about the torqueusers mailing list