[torqueusers] epilogue.parallel?

Clifton Kirby ckirby3 at colsa.com
Thu Oct 6 14:41:21 MDT 2005


I am also trying to get epilogue.parallel working and have had the same
experiences as you.  My version of Torque is 1.2.0p5.  I would like to avoid
using ssh in my epilogue script to run processes on all the nodes.  Here is
my epilogue,

------------------------------------------------------------------
#!/bin/sh
# Cleanup user jobs from compute nodes
# 1 -- jobid
# 2 -- userid
# 3 -- grpid
echo "---------------------------"

# Set key variables
USER=$2
echo Killing processes of user $USER on the batch nodes
for node in `cat $PBS_NODEFILE`
  do
        echo Doing node $node
        ssh -a -f -k -n -x $node killall -u $USER -KILL &
  done
wait
echo "Done."

-------------------------------------------------------------------

It would be alot cleaner if I could use epilogue.parallel and simply run the
following script followed by epilogue.  Assuming epilogue.parallel does
exist, what actually runs first?  The epilogue or epilogue.parallel?

---------------------------
#!/bin/sh
USER=$2
killall -u $USER -KILL &
---------------------------

It's just that the darn script won't run.  The prologue.parallel runs great
and my permissions are as they should be.  I bumped the logging level on the
MOMs to 7 and clearly saw in the logs where it was looking for
epilogue.precancel, epilogue.user, epilogue, prologue, prologue.parallel,
etc... but never saw where it checked for epilogue.parallel.

Thanks!

- Cliff


-----Original Message-----
From: torqueusers-bounces at supercluster.org
[mailto:torqueusers-bounces at supercluster.org]On Behalf Of Troy P Chuang
Sent: Thursday, October 06, 2005 1:16 PM
To: Maestas, Christopher Daniel
Cc: torqueusers at supercluster.org
Subject: Re: [torqueusers] epilogue.parallel?


Hi,

As I mentioned, I have created the following script on the sister
pbs_mom nodes

/usr/spool/PBS/mom_priv/epilogue.parallel
(file mode 700, file owner: root since root is the torque/maui admin
account)

However, it never gets executed after the job is finished whereas the
prologue.parallel script

/usr/spool/PBS/mom_priv/prologue.parallel
(file mode 700, file owner: root)

DOSE get executed correctly. In addition, my prologue/epilogue is working
fine on the pbs_mom mother superior nodes as well.

To sum up, the prologue, epilogue and prologue.parallel of my torque
installation are working well. The only problem I have is that
epilogue.parallel is not being executed.

>From your reply, it seems you have get the epilogue.parallel script
working.
Where do you put your file? and what file mode do you use?
I appreciate any help in debugging my setup.

Thanks
-Troy


Maestas, Christopher Daniel wrote:

>The prologue and epilogue .parallel scripts only get executed on the
>pbs_mom sisters.
>You essentially have to do the same thing on the pbs_mom mother superior
>node of your job as well.
>I found that out by checking /etc/security/access.conf on the a set of
>nodes in a running job.
>These scripts help out tremendously in running on > 1024 node
>environments! Thanks for this method! :-)
>
>-----Original Message-----
>From: torqueusers-bounces at supercluster.org
>[mailto:torqueusers-bounces at supercluster.org] On Behalf Of Troy P Chuang
>Sent: Monday, October 03, 2005 9:15 AM
>To: torqueusers at supercluster.org
>Subject: [torqueusers] epilogue.parallel?
>
>Hi all,
>Does epilogue.parallel really exist?
>It is mentioned in both
>http://www.clusterresources.com/products/torque/docs20/a.gprologueepilog
>ue.shtml
>http://www.clusterresources.com/products/torque/docs/4.3prologueepilogue
>.shtml
>
>However, the script /usr/spool/PBS/mom_priv/epilogue.parallel that I
>created never get executed.
>
>Or is there any epilogue-script mechanism that will be performed on all
>sister nodes AFTER a parallel job finished (not necessary an mpi job)?
>Specifically, I would like to reverse the changes I made by
>prologue.parallel (ex. reverse the changes made to /etc/authuser or
>/etc/security/access.conf.
>
>Thanks
>-Troy
>
>_______________________________________________
>torqueusers mailing list
>torqueusers at supercluster.org
>http://www.supercluster.org/mailman/listinfo/torqueusers
>
>
>
>
>

_______________________________________________
torqueusers mailing list
torqueusers at supercluster.org
http://www.supercluster.org/mailman/listinfo/torqueusers


--
No virus found in this incoming message.
Checked by AVG Anti-Virus.
Version: 7.0.344 / Virus Database: 267.11.11/121 - Release Date: 10/6/2005




More information about the torqueusers mailing list