[torqueusers] Torque Mac OS leopard (10.5.2)
barman at lowell.edu
Fri Feb 22 08:40:02 MST 2008
An update to the problem described below. If I do not use
qdel -p to remove the stuck job, then pbs_server stays alive
and the job exits (ungracefully) after the 900 seconds (see
mom "time out" error below).
So, other than having the resources clogged up for 900 seconds i
(which is a pain, the queuing system appears to ~ work.
Also, "host 168453161" (aka 10.10.100.41) in the error message below
is the node that runs the server and, in this case, is also the
hope somebody out there has had some leopard experience!
On Feb 21, 2008, at 7:42 PM, Travis Barman wrote:
> I have recently obtained 4 new (8-core) xserves running leopard
> I had previously installed torque on my tiger-based (non-server OS)
> intel Macs
> without too much trouble. I'm trying the same installation
> procedure that I had used
> before with Tiger (10.4) along with the latest torque build
> (v2.2.1) and am
> encountering the following problems:
> jobs are queued and run just fine. However the do not exit the
> queue but
> simply remain with status 'E'. I am forced to remove the job
> manually by
> executing 'qdel -p' .... this problem happens for ALL jobs
> including the
> simple echo 'sleep 10' test.
> In addition (and far more severe) the pbs_server simply dies after
> a few minutes after deleting a job.
> Mom log has this rather bad sounding error message in it:
> pbs_mom;Svr;pbs_mom;wait_request, connection 9 to host 168453161
> has timed out out after 900 seconds - closing stale connection
> Anybody out there able to get torque running on leopard server ????
> thanks in advance,
> torqueusers mailing list
> torqueusers at supercluster.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the torqueusers