[torqueusers] Problem: qsub fails to submit jobs

Raymond Page pagerc at ufl.edu
Thu Sep 2 15:58:31 MDT 2004


I tried what Joseph said but still no steam.  I corrected the 
mom_config to have $ideal_load and $max_load with the '$'.  
Perhaps someone could explain the error I'm getting from my server 
and scheduler logs?  It's not an error, the scheduler just deletes 
the new job.  Could this be caused by a 2.6 kernel and enabling 
load balancing?


fidget $ qsub  qsub.sh
503.osgmon.cns.ufl.edu


*** /usr/spool/PBS/sched_logs//20040902 ***
09/02/2004 17:53:05;0008; pbs_sched;Job;503.osgmon.cns.ufl.edu;Job 
Deleted because it would never run
09/02/2004 17:53:05;0040; pbs_sched;Job;503.osgmon.cns.ufl.edu;Not 
enough of the right type of nodes available


*** /usr/spool/PBS/server_logs//20040902 ***
...
09/02/2004 17:53:05;0100;PBS_Server;Req;;Type commit request 
received from osg at fidget.cns.ufl.edu, sock=9
09/02/2004 
17:53:05;0100;PBS_Server;Job;503.osgmon.cns.ufl.edu;enqueuing into 
test, state 1 hop 1
09/02/2004 17:53:05;0008;PBS_Server;Job;503.osgmon.cns.ufl.edu;Job 
Queued at request of osg at fidget.cns.ufl.edu, owner = 
osg at fidget.cns.ufl.edu, job name = qsub.sh, queue = test
09/02/2004 
17:53:05;0040;PBS_Server;Svr;osgmon.cns.ufl.edu;Scheduler sent 
command new
...
09/02/2004 17:53:05;0100;PBS_Server;Req;;Type deletejob request 
received from Scheduler at osgmon.cns.ufl.edu, sock=10
09/02/2004 17:53:05;0008;PBS_Server;Job;503.osgmon.cns.ufl.edu;Job 
deleted at request of Scheduler at osgmon.cns.ufl.edu
09/02/2004 
17:53:05;0100;PBS_Server;Job;503.osgmon.cns.ufl.edu;dequeuing from 
test, state 5
09/02/2004 17:53:05;0100;PBS_Server;Req;;Type modifyjob request 
received from Scheduler at osgmon.cns.ufl.edu, sock=10
09/02/2004 17:53:05;0080;PBS_Server;Job;;Unknown Job Id
09/02/2004 17:53:05;0080;PBS_Server;Req;req_reject;Reject reply 
code=15001, aux=0, type=11, from Scheduler at osgmon.cns.ufl.edu



On Thu Sep 02 15:12:37 EDT 2004, Joseph Spadavecchia 
<j.spadavecchia at ed.ac.uk> wrote:

> On Thu, 2004-09-02 at 16:12, Raymond Page wrote:
>> fidget $ qsub -l nodes=rah.cns.ufl.edu qsub.sh
>> qsub: Job exceeds queue resource limits
>> 
>> I'm attempting to use the below setup and cannot understand why 
>> I am receiving the qsub failure above.  I had shared clusters 
>> working, and now I want the ability to load balance with 
>> timeshared nodes.  I'd appreciate being informed of what I'm 
>> missing in my setup to let jobs execute on a timeshared host.
> 
> With timeshared nodes you cannot request exclusive resources with 
> -l.
> 
> qsub qsub.sh will work.
> 
> Also, make sure you've added $ideal_load and $max_load to
> mom_priv/config.
> 
>> 
>> --
>> Raymond Page
>> 
>> 
>> 
>> server_priv/nodes:
>> osgmon.cns.ufl.edu:ts local server production osgmon
>> fidget.cns.ufl.edu:ts np=2 remote workstation fidget
>> rah.cns.ufl.edu:ts remote server rah
>> 
>> mom_priv/config:
>> $serverhost     localhost
>> $clienthost     osgmon.cns.ufl.edu
>> $restricted     *.cns.ufl.edu
>> $usecp          *.cns.ufl.edu:/home /home
>> $usecp          *.cns.ufl.edu:/u /u
>> $logevent       255
>> ideal_load      5
>> max_load        8
>> 
>> 
>> $ qmgr -c "p s"
>> # Create queues and set their attributes.
>> create queue test
>> set queue test queue_type = Execution
>> set queue test resources_default.nodect = 1
>> set queue test resources_default.nodes = 1
>> set queue test resources_default.walltime = 01:00:00
>> set queue test enabled = True
>> set queue test started = True
>> # Set server attributes.
>> set server scheduling = True
>> set server default_queue = test
>> set server log_events = 511
>> set server mail_from = adm
>> set server scheduler_iteration = 600
>> set server node_ping_rate = 300
>> set server node_check_rate = 600
>> _______________________________________________
>> torqueusers mailing list
>> torqueusers at supercluster.org
>> http://supercluster.org/mailman/listinfo/torqueusers
> 
> 
> 






More information about the torqueusers mailing list