[torqueusers] acl_hosts oddity
Garrick Staples
garrick at usc.edu
Tue Jan 31 17:07:31 MST 2006
On Tue, Jan 31, 2006 at 03:17:43PM -0500, nathaniel.x.woody at gsk.com alleged:
> First of all, thank you for your previous assistance on figuring out
> $tmpdir. For anyone else who struggles with that, the three pieces we
> needed were 1) running configure with "--enable-wordexp" and 2) setting
> $tmpdir /localscratch in the mom_priv/config file and 3) setting the
> TMPDIR environment variable to $PBS_JOBID in the job request. Now, torque
> happily creates a directory for each job that wants it and keeps all the
> jobs seperate. The job script just cd's to the $TMPDIR directory. Thanks,
> it works quite nicely now!
>
1) no longer necessary with p6 and p7.
2) yup.
3) is only true if the job is overriding $TMPDIR. MOM sets $TMPDIR for
the job, but allows the job to override it.
> I have noticed something of an oddity (I think), using torque2.0.0p5 and
> am curious if what I'm seeing is the expected behavior. When I enable
> acl_hosts, (qmgr "s s acl_hosts_enable=true"), this breaks torque in kind
> of a bizarre way. It looks like this prevents mom's from returning
> completed job information. I have to add compute nodes to the acl_hosts
> list (qmgr -c "s s acl_hosts += node1") in order to get the job to return.
> I suppose this means that returning the job info requires server services
> that are blocked by enabling acl_hosts?
That does sound odd. I've never used server acl_hosts, so I'm not
familiar with the behaviour. But this sounds like something we can
change.
I've have a bunch of stuff on my plate right now and will likely forget
this. Can you make a bug report? You can assign the bug to me.
> Eventually, after several minutes, the job get's reported as exceeding the
> wallclock time. I get a weird "MOAB_INFO: job exceeded wallclock limit"
> error and the job gets deleted. I think this is just the scheduler
> stepping in at some statjob polling interval and killing the job?
And this happens in advance of the actual walltime limit of the job?
> On a lark, I checked and specifying "ALLOWCOMPUTEHOSTSUBMIT true" in a
> torque.cfg file didn't appear to have any effect on this, which it seems
> like it should. At this point it appears that setting that parameter
> allows a compute node to do any operation except return a job result?
ALLOWCOMPUTEHOSTSUBMIT is to accept new job submits from nodes (running
qsub on nodes).
> If the above is the expected behavior, what kind of wildcard matching is
> allowed in the acl_hosts list?
You can use * as a glob. *.gsk.com, node*.gsk.com, etc.
--
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20060131/017f29d5/attachment.bin
More information about the torqueusers
mailing list