[torqueusers] Torque module for pdsh ?

Ole Holm Nielsen Ole.H.Nielsen at fysik.dtu.dk
Thu May 14 13:41:59 MDT 2009


I was recommended to use Parallel Distributed Shell
http://sourceforge.net/projects/pdsh/ for parallel commands
on our cluster.  The pdsh command has a very nice flag
that will execute a command on the nodes belonging to
a certain jobid, but only if you use the Slurm resource manager.
 From the pdsh(1) man-page:

slurm module options
        The slurm module allows pdsh to target nodes based on  currently  run-
        ning  SLURM jobs. The slurm module is typically called after all other
        node selection options have been processed, and if no nodes have  been
        selected,  the  module  will  attempt to read a running jobid from the
        SLURM_JOBID environment variable (which is set when  running  under  a
        SLURM  allocation).  If SLURM_JOBID references an invalid job, it will
        be silently ignored.

        -j jobid[,jobid,...]
               Target list of nodes allocated to the  SLURM  job  jobid.  This
               option  may  be  used  multiple  times to target multiple SLURM
               jobs. The special argument "all" can  be  used  to  target  all
               nodes running SLURM jobs, e.g.  -j all.

Question: Did anyone already write a Torque module for pdsh ? IMHO this
would be a very useful thing to have.

Thanks,
Ole Holm Nielsen
Technical University of Denmark


More information about the torqueusers mailing list