Moab-BProc Integration Guide

Moab-BProc Integration Guide

Overview

Moab can be used as an external scheduler for BProc. In this configuration, Moab will keep track of where jobs are running and therefore which nodes are available. This interaction is enabled by a number of scripts which take advantage of Moab's Native Resource Manager. It is important to note that in this configuration Moab must be run with root privileges in order to start each job as the user that submitted it. Because of this, these scripts are run as root as must be appropriately safeguarded against tampering.

Moab Configuration

Moab drives BProc via perl scripts which interact with the cluster, mostly using the bpsh command. Moab must be given the location of these scripts in the moab.cfg file. An example entry in moab.cfg is:
RMCFG[bproc]            TYPE=NATIVE FLAGS=FULLCP
RMCFG[bproc]            CLUSTERQUERYURL=exec:///$TOOLSDIR/node.query.bproc.pl
RMCFG[bproc]            WORKLOADQUERYURL=exec:///$TOOLSDIR/job.query.bproc.pl
RMCFG[bproc]            JOBSUBMITURL=exec:///$TOOLSDIR/job.submit.bproc.pl
RMCFG[bproc]            JOBCANCELURL=exec:///$TOOLSDIR/job.cancel.bproc.pl
RMCFG[bproc]            JOBSTARTURL=exec:///$TOOLSDIR/job.start.bproc.pl
RMCFG[bproc]            JOBSUSPENDURL=exec:///$TOOLSDIR/job.suspend.bproc.pl
RMCFG[bproc]            JOBRESUMEURL=exec:///$TOOLSDIR/job.resume.bproc.pl
These lines configure Moab to use the Native resource manager. FLAGS=FULLCP indicates that Moab will fully checkpoint each job if it is shutdown. This is necessary because Moab itself is keeping track of each job. In each line, "exec:" indicates the source is a program as opposed to a file. $HOME refers to the Moab home directory, not the user home. According to this configuration, the scripts should be located in a directory called "tools" in the Moab home directory.

Scripts

node.query.bproc.pl
This script monitors node status. Moab itself keeps track of which nodes are busy, the function of this script is to determine which nodes are up or down from the output of "bpstat -l". This script also attempts to determine node specs via the commands free and uname, and the output of "cat /proc/cpuinfo".

job.submit.bproc.pl
When Moab is ready to run a job, it is first submitted to this script, which outputs the job specifications to a file /tmp/.idlejobs.

job.start.bproc.pl
After the job specifications have been submitted to job.submit.bproc.pl and written to /tmp/.idlejobs the job is sent to this script which extracts that information, starts the job, and outputs important tracking information such as the process ID to /tmp/.runningjobs. The steps of this interaction are as follows:
  1. job.start.bproc.pl is given the job id, host nodes, and user id from Moab.
  2. Using the job id, the script extracts the executable, arguments, and path from /tmp/.idlejobs.
  3. The job is started via the command:
    sudo -u $job_user sh -c \"cd ~$job_user ; bpsh $newhost $job_exec$job_args 1> $job_id.out 2> $job_id.err\"
    
    which starts the job as the specified user on the specified hostlist and redirects stdout and stderr to the files $job_id.out and $job_id.err in the user's home directory.
  4. The child process of the above command is located via a call to "ps --ppid $ppid", $ppid being the process ID returned by the command given above. The process ID obtained in this way is written to /tmp/.runningjobs and is used to monitor the status of the job.
    Note: Because the script may search for the child ID before the child is actually created, the script may write the ppid to /tmp/.runningjobs instead and search for the child in a later script.


job.query.bproc.pl
This script is used to monitor the status of each job recorded in /tmp/.runningjobs. Additionally, the script will report any unrecorded processes to Moab using the program's process ID as the job id. The script operates as follows:
  1. All nodes are queried with the command:
    bpsh -A -p ps -eo state,pid,ppid,user,comm,etime
    
  2. PID's are read from /tmp/.runningjobs and searched for in the output of the above command. If the process is not found it is assumed completed and removed from /tmp/.runningjobs.
    Note:The above command may return user ID's instead of user names. In such instances the script will attempt to determine the user name by looking up the UID in /etc/passwd.
  3. Any processes which were not correlated to jobs are then reported to Moab using their ppid as job id.


job.cancel.bproc.pl
Sends a kill signal (kill -9) to the indicated job.

job.suspend.bproc.pl
Sends a stop signal (kill -19) to the indicated job.

job.resume.bproc.pl
Sends a continue signal (kill -18) to the indicated job.