[torqueusers] torque-1.2.0p1-snap.1107893767 will not compile on
AIX 5.2 with XLC (and how I worked around it)
garrick at usc.edu
Wed Feb 16 10:09:19 MST 2005
On Wed, Feb 16, 2005 at 12:26:23PM +0100, Bas van der Vlies alleged:
> First of all great to see these patches. I have tested some values
> > There's a new pbs_mom config parameter called $jobstartblocktime that
> > defines how long pbs_server will initially block while waiting for a
> > job to start. It defaults to 5 seconds, but we'd like people to test
> > lower values like 1 or 0. The lower the value, the better pbs_server
> > should respond to client requests (like qstat) while starting up jobs.
> > If 0 doesn't cause any problems, it willbe the default in future >
> > releases. Please test!
> I have set various values for $jobstartblocktime, (0 --> 20), but i did
> not see any slow down in qstat. The load on this test system is not huge.
If your pro/epilogues aren't more than 3 or 4 seconds, you'll never notice a
> > Another shaky area is with restarting pbs_mom daemons. It should now
> > be possible to restart any daemon at any time without breaking jobs.
> > pbsdsh has been enhanced to live in this world of restarting moms. I
> > can already tell you that mpiexec won't deal with it properly. I'm
> > worried about these changes effecting the recoverability of failing
> > jobs. Please test!
> Must i specify an option to pbs_mom to enable restart the jobs. like
> '-p' or must it work out of the box. I have tried it without options and
> the jobs get restarted and an interactive job is killed.
> With the '-p' option:
> - Interactive job will be killed
> - an job is not restarted
It should now work as advertised in the pbs_mom manpage. -p would be used to
recover jobs after restarting mom.
If you run 'pbs_mom -p' under PBSDEBUG you'll see messages about recovering and
saving stderr, stdout, nodeid, and taskid numbers.
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20050216/25aa0a63/attachment-0001.bin
More information about the torqueusers