[torqueusers] torque-1.2.0p1-snap.1107893767 will not compile on AIX 5.2 with XLC (and how I worked around it)

Garrick Staples garrick at usc.edu
Wed Feb 16 10:09:19 MST 2005


On Wed, Feb 16, 2005 at 12:26:23PM +0100, Bas van der Vlies alleged:
> First of all great to see these patches. I have tested some values
> 
> > There's a new pbs_mom config parameter called $jobstartblocktime that
> > defines how long pbs_server will initially block while waiting for a
> > job to start.  It defaults to 5 seconds, but we'd like people to test
> > lower values like 1 or 0. The lower the value, the better pbs_server
> > should respond to client requests (like qstat) while starting up jobs.
> > If 0 doesn't cause any problems, it willbe the default in future >
> > releases.  Please test!
> 
> I have set various values for $jobstartblocktime, (0 --> 20), but i did 
> not see any slow down in qstat. The load on this test system is not huge.

If your pro/epilogues aren't more than 3 or 4 seconds, you'll never notice a
difference. 

 
> > Another shaky area is with restarting pbs_mom daemons.  It should now
> > be possible to restart any daemon at any time without breaking jobs.
> > pbsdsh has been enhanced to live in this world of restarting moms.  I 
> > can already tell you that mpiexec won't deal with it properly.  I'm
> > worried about these changes effecting the recoverability of failing
> > jobs.  Please test!
> 
> Must i specify an option to pbs_mom to enable restart the jobs. like 
> '-p' or must it work out of the box. I have tried it without options and
> the jobs get restarted and an interactive job is killed.
> 
> With the '-p' option:
>  - Interactive job will be killed
>  - an job is not restarted

It should now work as advertised in the pbs_mom manpage. -p would be used to
recover jobs after restarting mom.

If you run 'pbs_mom -p' under PBSDEBUG you'll see messages about recovering and
saving stderr, stdout, nodeid, and taskid numbers.

-- 
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20050216/25aa0a63/attachment-0001.bin


More information about the torqueusers mailing list