[torqueusers] acl_hosts oddity

Steven A. DuChene linux-clusters at mindspring.com
Wed Feb 1 12:32:12 MST 2006


Yes, if acl_hosts_enable is set to False and there is a list of hosts
in the acl_hosts setting then it acts to constrain the hosts available
to a particular queue for job execution. At the advice of the expert CRI
support folks we are using that functionality in Torque and standing
reservations within Moab to effectively partition our cluster for seperate
business units.

-----Original Message-----
>From: nathaniel.x.woody at gsk.com
>Sent: Feb 1, 2006 11:05 AM
>To: "Steven A. DuChene" <linux-clusters at mindspring.com>
>Cc: torqueusers at supercluster.org
>Subject: Re: [torqueusers] acl_hosts oddity
>
>Thanks for the info.  I'll submit the bug later today.
>
>I actually had no idea that  acl_hosts had any effect if acl_hosts_enable 
>is set to false, is that behavior documented somewhere?  The 
>pbs_server_attributes man page seems to suggest that acl_hosts is only 
>used if acl_hosts_enable is set to true, though a re-reading shows that it 
>is not explicit about that. 
>
>I started messing with acl_hosts because we have two classes of clients of 
>our cluster,
>
>1) submission hosts - we don't encourage local submission of jobs and 
>instead have several other servers that we have designated as submission 
>hosts.  We also do this as we have some web services that use the cluster, 
>so we run these services on submission host servers.
>
>2) client (for the lack of a better name)  hosts - we also want to specify 
>a number of machines that have only client (status, etc) access to the 
>cluster. 
>
>My understanding was that both submission hosts and client hosts would 
>need to be in the acl_hosts list, while submission hosts would also need 
>to be identified as submission hosts.  After playing around, I'm not sure 
>if that's true or not.  Does providing a machine in the acl_hosts list 
>allow job submission from that host?
>
>On the wildcard point, should nate*.gsk.com work (or 
>node*-hivemind.gsk.com)?  It does not appear to.  If it should, I'll play 
>around a bit more as I probably got something wrong.
>
>Thanks for the help,
>Nate
>
>
>
>
>
>
>"Steven A. DuChene" <linux-clusters at mindspring.com> 
>Sent by: torqueusers-bounces at supercluster.org
>31-Jan-2006 21:10
>Please respond to "Steven A. DuChene" <linux-clusters at mindspring.com>
>
> 
>To
>nathaniel.x.woody at gsk.com, torqueusers at supercluster.org
>cc
>
>Subject
>Re: [torqueusers] acl_hosts oddity
>
>
>
>
>
>
>Can you explain what exactly you are trying to accomplish with the 
>acl_hosts
>settings? There are two ways to use this that are distinctly different 
>depending
>on what you set acl_hosts_enable to (true or false). I.E. it is possible
>to use the functionality of acl_hosts system lists for a different 
>function
>if acl_hosts_enable is set to false.
>
>-----Original Message-----
>>From: nathaniel.x.woody at gsk.com
>>Sent: Jan 31, 2006 3:17 PM
>>To: torqueusers at supercluster.org
>>Subject: [torqueusers] acl_hosts oddity
>>
>>First of all, thank you for your previous assistance on figuring out 
>>$tmpdir.  For anyone else who struggles with that, the three pieces we 
>>needed were 1) running configure with "--enable-wordexp" and 2) setting 
>>$tmpdir /localscratch in the mom_priv/config file and 3) setting the 
>>TMPDIR environment variable to $PBS_JOBID in the job request.  Now, 
>torque 
>>happily creates a directory for each job that wants it and keeps all the 
>>jobs seperate.  The job script just cd's to the $TMPDIR directory. 
>Thanks, 
>>it works quite nicely now!
>>
>>I have noticed something of an oddity (I think), using torque2.0.0p5 and 
>>am curious if what I'm seeing is the expected behavior.  When I enable 
>>acl_hosts, (qmgr "s s acl_hosts_enable=true"), this breaks torque in kind 
>
>>of a bizarre way.  It looks like this prevents mom's from returning 
>>completed job information.  I have to add compute nodes to the acl_hosts 
>>list (qmgr -c "s s acl_hosts += node1") in order to get the job to 
>return. 
>> I suppose this means that returning the job info requires server 
>services 
>>that are blocked by enabling acl_hosts?
>>
>>Eventually, after several minutes, the job get's reported as exceeding 
>the 
>>wallclock time.  I get a weird "MOAB_INFO: job exceeded wallclock limit" 
>>error and the job gets deleted.  I think this is just the scheduler 
>>stepping in at some statjob polling interval and killing the job? 
>>
>>On a lark, I checked and specifying "ALLOWCOMPUTEHOSTSUBMIT true" in a 
>>torque.cfg file didn't appear to have any effect on this, which it seems 
>>like it should.  At this point it appears that setting that parameter 
>>allows a compute node to do any operation except return a job result?
>>
>>If the above is the expected behavior, what kind of wildcard matching is 
>>allowed in the acl_hosts list?
>>
>>Best,
>>Nate
>
>_______________________________________________
>torqueusers mailing list
>torqueusers at supercluster.org
>http://www.supercluster.org/mailman/listinfo/torqueusers
>
>





More information about the torqueusers mailing list