[Mauiusers] A question regarding user proxies with Torque 2.4.2 and Maui 3.2.6p21

Douglas Wade Needham dneedham at cmu.edu
Tue Dec 8 14:10:39 MST 2009

Greetings all,

I have recently been asking questions on torqueusers, having recently
been tasked to install Torque and Maui on a cluster we have, and having
no experience with them.  I have identified an issue I am having with
user proxies as clearly being with Maui, and I wanted to ask a couple of
questions (at the very end) of folks far more familiar with Maui than
myself.  For the details... 

The professors in charge of our cluster want to have our end-users
submit jobs using '-u' to have the jobs run as a general cloud user ID,
without having to su/sudo.  In my reading, I found that I have to

        allow_proxy_user = True

for Torque.  However, when I tried to submit a job, it got stuck in the
queues in BatchHold.  Even creating a .rhosts file as suggested in
various locations (even though we use .ssh, and do not use rsh/rcp) did
no good, nor did creating a .ssh authorized_keys file.  Jobs still get
stuck like this:

        root at cloudhead# checkjob -v 41
        checking job 41 (RM job '41.cloudhead.cloudA')
        State: Idle
        Creds:  user:dneedham  group:clouduser  class:batch  qos:DEFAULT
        WallTime: 00:00:00 of 1:00:00
        SubmitTime: Tue Dec  8 10:29:14
          (Time Queued  Total: 5:26:36  Eligible: 00:00:00)
        Total Tasks: 16
        Req[0]  TaskCount: 16  Partition: ALL
        Network: [NONE]  Memory >= 0  Disk >= 0  Swap >= 0
        Opsys: [NONE]  Arch: [NONE]  Features: [NONE]
        Exec:  ''  ExecSize: 0  ImageSize: 0
        Dedicated Resources Per Task: PROCS: 1
        NodeAccess: SHARED
        TasksPerNode: 8  NodeCount: 2
        IWD: [NONE]  Executable:  [NONE]
        Bypass: 0  StartCount: 0
        PartitionMask: [ALL]
        SystemQueueTime: Tue Dec  8 15:27:07
        Flags:       RESTARTABLE
        Holds:    Batch  (hold reason:  (null))
        Messages:  job not authorized to use proxy credentials
        PE:  16.00  StartPriority:  28
        cannot select job 41 for partition DEFAULT (job hold active)
        root at cloudhead# diagnose -j 41
        Name                  State Par Proc QOS     WCLimit R  Min     User    Group  Account  QueuedTime  Network  Opsys   Arch    Mem   Disk  Procs       Class Features
        41                     Idle ALL   16 DEF     1:00:00 0   16 dneedham clouduse        -    00:29:14   [NONE] [NONE] [NONE]    >=0    >=0    NC0   [batch:1] [NONE]
These jobs proceed as I would expect when I do a runjob, and work fine
when I submit them as the targeted user ID.

Now, doing some digging, I was finding pointers from the Maui docs to
the Moab docs, with mention of setting something like:

        USERCFG[DEFAULT]        PROXYLIST=validate
        RMCFG[cloudhead] TYPE=PBS JOBVALIDATEURL=exec:/var/spool/maui/tools/job.validate.proxy.sh

However, even after doing so and restarting maui, I find that showconfig
does not show those changes, and the jobs still get stuck.

Here is the job submission script:

        #PBS -N collect_info
        #PBS -l nodes=2:ppn=8
        #PBS -d /mnt/scratch/dneedham/work
        #PBS -o /mnt/scratch/dneedham/work/collect.${PBS_JOBID}.log
        #PBS -e /mnt/scratch/dneedham/work/collect.${PBS_JOBID}.errs
        #PBS -u clouduser
        cd /mnt/scratch/dneedham/work
        NP=`cat ${PBS_NODEFILE} | wc -l`
        /usr/bin/mpirun --verbose -np ${NP} -machinefile ${PBS_NODEFILE} /mnt/scratch/dneedham/test_collect/collect_info

As I said, when I run it as "clouduser", it works fine, but when I try
to submit this from dneedham to run as clouduser, it blocks.

With this said, is this something which is even doable with Torque+Maui,
or does it require Moab instead?  And if it can be done with Torque
+Maui, can someone point me towards what I am doing wrong??


- Doug

