[torqueusers] acl_hosts oddity

Steven A. DuChene linux-clusters at mindspring.com
Tue Jan 31 19:10:34 MST 2006


Can you explain what exactly you are trying to accomplish with the acl_hosts
settings? There are two ways to use this that are distinctly different depending
on what you set acl_hosts_enable to (true or false). I.E. it is possible
to use the functionality of acl_hosts system lists for a different function
if acl_hosts_enable is set to false.

-----Original Message-----
>From: nathaniel.x.woody at gsk.com
>Sent: Jan 31, 2006 3:17 PM
>To: torqueusers at supercluster.org
>Subject: [torqueusers] acl_hosts oddity
>
>First of all, thank you for your previous assistance on figuring out 
>$tmpdir.  For anyone else who struggles with that, the three pieces we 
>needed were 1) running configure with "--enable-wordexp" and 2) setting 
>$tmpdir /localscratch in the mom_priv/config file and 3) setting the 
>TMPDIR environment variable to $PBS_JOBID in the job request.  Now, torque 
>happily creates a directory for each job that wants it and keeps all the 
>jobs seperate.  The job script just cd's to the $TMPDIR directory. Thanks, 
>it works quite nicely now!
>
>I have noticed something of an oddity (I think), using torque2.0.0p5 and 
>am curious if what I'm seeing is the expected behavior.  When I enable 
>acl_hosts, (qmgr "s s acl_hosts_enable=true"), this breaks torque in kind 
>of a bizarre way.  It looks like this prevents mom's from returning 
>completed job information.  I have to add compute nodes to the acl_hosts 
>list (qmgr -c "s s acl_hosts += node1") in order to get the job to return. 
> I suppose this means that returning the job info requires server services 
>that are blocked by enabling acl_hosts?
>
>Eventually, after several minutes, the job get's reported as exceeding the 
>wallclock time.  I get a weird "MOAB_INFO: job exceeded wallclock limit" 
>error and the job gets deleted.  I think this is just the scheduler 
>stepping in at some statjob polling interval and killing the job? 
>
>On a lark, I checked and specifying "ALLOWCOMPUTEHOSTSUBMIT true" in a 
>torque.cfg file didn't appear to have any effect on this, which it seems 
>like it should.  At this point it appears that setting that parameter 
>allows a compute node to do any operation except return a job result?
>
>If the above is the expected behavior, what kind of wildcard matching is 
>allowed in the acl_hosts list?
>
>Best,
>Nate



More information about the torqueusers mailing list