[torquedev] BLCR changes to be put in Torque 2.5.3

Al Taufer ataufer at adaptivecomputing.com
Tue Oct 5 13:40:03 MDT 2010


We would like to put the following changes into the Torque 2.5.3 release.

1) Add --with-servchkptdir configure option which allows specifying a different path for the servers checkpoint files. To do this we need to change the current behaviour on the pbs_mom. Currently, when the pbs_mom creates checkpoint images in the default location it creates them in a subdirectory based on job ID (ie. 200216.molo.CK).  But when the job has a checkpoint_dir specified then the checkpoint images are created directly in the checkpoint_dir path without any job ID subdirectory.  The pbs_mom will now always create checkpoint images in a Job ID subdirectory.

2) Change so all checkpoint file transfers occur as the user instead of as root.  This also changes the permissions on the $TORQUEHOME/checkpoint directory to be world writable with the sticky bit set.

There have been a few requests to have the pbs_mom invoke the restart_script as the user instead of as root, which is how it currently works.  We don't think this is needed since the restart_script in the /contrib/blcr directory already runs the actual cr_restart command as the user so there should not be any access issues due to filesystems with root squash turned on.

Please let me know if you have any concerns.

Al Taufer
Adaptive Computing



More information about the torquedev mailing list