[torqueusers] acl_hosts oddity
Steven A. DuChene
linux-clusters at mindspring.com
Tue Jan 31 19:10:34 MST 2006
Can you explain what exactly you are trying to accomplish with the acl_hosts
settings? There are two ways to use this that are distinctly different depending
on what you set acl_hosts_enable to (true or false). I.E. it is possible
to use the functionality of acl_hosts system lists for a different function
if acl_hosts_enable is set to false.
>From: nathaniel.x.woody at gsk.com
>Sent: Jan 31, 2006 3:17 PM
>To: torqueusers at supercluster.org
>Subject: [torqueusers] acl_hosts oddity
>First of all, thank you for your previous assistance on figuring out
>$tmpdir. For anyone else who struggles with that, the three pieces we
>needed were 1) running configure with "--enable-wordexp" and 2) setting
>$tmpdir /localscratch in the mom_priv/config file and 3) setting the
>TMPDIR environment variable to $PBS_JOBID in the job request. Now, torque
>happily creates a directory for each job that wants it and keeps all the
>jobs seperate. The job script just cd's to the $TMPDIR directory. Thanks,
>it works quite nicely now!
>I have noticed something of an oddity (I think), using torque2.0.0p5 and
>am curious if what I'm seeing is the expected behavior. When I enable
>acl_hosts, (qmgr "s s acl_hosts_enable=true"), this breaks torque in kind
>of a bizarre way. It looks like this prevents mom's from returning
>completed job information. I have to add compute nodes to the acl_hosts
>list (qmgr -c "s s acl_hosts += node1") in order to get the job to return.
> I suppose this means that returning the job info requires server services
>that are blocked by enabling acl_hosts?
>Eventually, after several minutes, the job get's reported as exceeding the
>wallclock time. I get a weird "MOAB_INFO: job exceeded wallclock limit"
>error and the job gets deleted. I think this is just the scheduler
>stepping in at some statjob polling interval and killing the job?
>On a lark, I checked and specifying "ALLOWCOMPUTEHOSTSUBMIT true" in a
>torque.cfg file didn't appear to have any effect on this, which it seems
>like it should. At this point it appears that setting that parameter
>allows a compute node to do any operation except return a job result?
>If the above is the expected behavior, what kind of wildcard matching is
>allowed in the acl_hosts list?
More information about the torqueusers