[torqueusers] torque problem: submitting jobs from nodes

James A. Peltier jpeltier at sfu.ca
Tue Sep 6 18:53:33 MDT 2011


----- Original Message -----
| Hi All:
| 
| I've got a user who's trying to have his jobs checkpoint and re-queue
| themselves at the end of their runtime so as to allow it to run with
| shorter walltime limits (and thus help balance cluster usage and fair
| share, etc). Of course, for this to work, he needs to be able to
| submit jobs (qsub) from the comptute nodes. I figured this should be
| no big deal, and check my qmgr settings:
| 
| Qmgr: print server
| #
| # Create queues and set their attributes.
| #
| #
| # Create and define queue default
| #
| create queue default
| set queue default queue_type = Execution
| set queue default resources_max.walltime = 24:00:00
| set queue default resources_default.nodes = 1
| set queue default resources_default.walltime = 01:00:00
| set queue default enabled = True
| set queue default started = True
| #
| # Create and define queue long
| #
| create queue long
| set queue long queue_type = Execution
| set queue long enabled = True
| set queue long started = True
| #
| # Set server attributes.
| #
| set server scheduling = True
| set server acl_host_enable = False
| set server acl_user_enable = False
| set server managers = kusznir at aeolus.wsu.edu
| set server managers += maui at aeolus.wsu.edu
| set server managers += root at aeolus.wsu.edu
| set server default_queue = default
| set server log_events = 511
| set server mail_from = adm
| set server query_other_jobs = True
| set server resources_available.nodect = 288
| set server scheduler_iteration = 600
| set server node_check_rate = 150
| set server tcp_timeout = 6
| set server next_job_number = 304175
| 
| 
| Unfortunately, when one tries to submit a job from a compute node, one
| gets:
| [kusznir at compute-0-20 ~]$ qsub -I -l nodes=1
| qsub: Bad UID for job execution MSG=ruserok failed validating
| kusznir/kusznir from compute-0-20.local
| 
| What's going on here? As far as I can read, all the settings are set
| to allow this to work. What's wrong?
| 
| Thanks!
| --Jim
| _______________________________________________
| torqueusers mailing list
| torqueusers at supercluster.org
| http://www.supercluster.org/mailman/listinfo/torqueusers

set server allow_node_submit = True


-- 
James A. Peltier
IT Services - Research Computing Group
Simon Fraser University - Burnaby Campus
Phone   : 778-782-6573
Fax     : 778-782-3045
E-Mail  : jpeltier at sfu.ca
Website : http://www.sfu.ca/itservices
          http://blogs.sfu.ca/people/jpeltier
I will do the best I can with the talent I have



More information about the torqueusers mailing list