[torqueusers] Force a job to rerun after mom has crashed

Steve Crusan scrusan at ur.rochester.edu
Wed Aug 24 16:32:21 MDT 2011


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


On Aug 24, 2011, at 6:03 PM, Ken Nielson wrote:

> 
> 
> ----- Original Message -----
>> From: "David Sheen" <sheen at usc.edu>
>> To: "Ken Nielson" <knielson at adaptivecomputing.com>
>> Cc: "Mahmood Naderan" <nt_mahmood at yahoo.com>, "Torque Users Mailing List" <torqueusers at supercluster.org>
>> Sent: Wednesday, August 24, 2011 2:53:25 PM
>> Subject: Re: [torqueusers] Force a job to rerun after mom has crashed
>> Ken,
>> 
>> The node has been taken offline by the administrator for testing.
>> 
>> David
>> 
>> 
> 
> Not a good practice with MOMs running jobs. However, you can still run the pbs_mom -q when it restarts. But I am not sure if the job will still be at the server or not. If it is not at the server then the job is lost.
> 
> Ken


What we usually do is set a reservation on the node starting immediately for all of it's resources, that lasts forever. So once the jobs are finished running on the node, no more can start, and THEN you can take the node's pbs_mom offline.

~Steve

> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers

 ----------------------
 Steve Crusan
 System Administrator
 Center for Research Computing
 University of Rochester
 https://www.crc.rochester.edu/


-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.17 (Darwin)
Comment: GPGTools - http://gpgtools.org

iQEcBAEBAgAGBQJOVXv8AAoJENS19LGOpgqK0GMH/3KUZf+gbHTbylMBA31kQZEZ
sKiqSvBJsXYk2SL3IN7d5yiqiw91Um0iSgXePkzluizBs4CzvkSla9sPvW/2jABt
0dlrJS8Eev43KyInSwB/1nZDUrbRui7bpUEq68JxbZ9mHFugN01ncP5iLu4773M6
bMt4dYWPu/CcmKrgPGp2PtCGxE8XCFVloNLnBshoTLAS7wIeLqmDpEm2YoeDik9z
WezdaPpdgGvCHd7/uWyCRHclBQbn3q5B/+3/Cc45whwpwBrozfwzm+xNVFYsXT0t
mSWMr4Xg21DcQaZUL8xwkU8vvjfKwWKRzGXCxyBRet+muKFq0oPImqq7Whh9Sdo=
=xIJn
-----END PGP SIGNATURE-----


More information about the torqueusers mailing list