[torqueusers] Problem: qsub fails to submit jobs
Raymond Page
pagerc at ufl.edu
Thu Sep 2 15:58:31 MDT 2004
I tried what Joseph said but still no steam. I corrected the
mom_config to have $ideal_load and $max_load with the '$'.
Perhaps someone could explain the error I'm getting from my server
and scheduler logs? It's not an error, the scheduler just deletes
the new job. Could this be caused by a 2.6 kernel and enabling
load balancing?
fidget $ qsub qsub.sh
503.osgmon.cns.ufl.edu
*** /usr/spool/PBS/sched_logs//20040902 ***
09/02/2004 17:53:05;0008; pbs_sched;Job;503.osgmon.cns.ufl.edu;Job
Deleted because it would never run
09/02/2004 17:53:05;0040; pbs_sched;Job;503.osgmon.cns.ufl.edu;Not
enough of the right type of nodes available
*** /usr/spool/PBS/server_logs//20040902 ***
...
09/02/2004 17:53:05;0100;PBS_Server;Req;;Type commit request
received from osg at fidget.cns.ufl.edu, sock=9
09/02/2004
17:53:05;0100;PBS_Server;Job;503.osgmon.cns.ufl.edu;enqueuing into
test, state 1 hop 1
09/02/2004 17:53:05;0008;PBS_Server;Job;503.osgmon.cns.ufl.edu;Job
Queued at request of osg at fidget.cns.ufl.edu, owner =
osg at fidget.cns.ufl.edu, job name = qsub.sh, queue = test
09/02/2004
17:53:05;0040;PBS_Server;Svr;osgmon.cns.ufl.edu;Scheduler sent
command new
...
09/02/2004 17:53:05;0100;PBS_Server;Req;;Type deletejob request
received from Scheduler at osgmon.cns.ufl.edu, sock=10
09/02/2004 17:53:05;0008;PBS_Server;Job;503.osgmon.cns.ufl.edu;Job
deleted at request of Scheduler at osgmon.cns.ufl.edu
09/02/2004
17:53:05;0100;PBS_Server;Job;503.osgmon.cns.ufl.edu;dequeuing from
test, state 5
09/02/2004 17:53:05;0100;PBS_Server;Req;;Type modifyjob request
received from Scheduler at osgmon.cns.ufl.edu, sock=10
09/02/2004 17:53:05;0080;PBS_Server;Job;;Unknown Job Id
09/02/2004 17:53:05;0080;PBS_Server;Req;req_reject;Reject reply
code=15001, aux=0, type=11, from Scheduler at osgmon.cns.ufl.edu
On Thu Sep 02 15:12:37 EDT 2004, Joseph Spadavecchia
<j.spadavecchia at ed.ac.uk> wrote:
> On Thu, 2004-09-02 at 16:12, Raymond Page wrote:
>> fidget $ qsub -l nodes=rah.cns.ufl.edu qsub.sh
>> qsub: Job exceeds queue resource limits
>>
>> I'm attempting to use the below setup and cannot understand why
>> I am receiving the qsub failure above. I had shared clusters
>> working, and now I want the ability to load balance with
>> timeshared nodes. I'd appreciate being informed of what I'm
>> missing in my setup to let jobs execute on a timeshared host.
>
> With timeshared nodes you cannot request exclusive resources with
> -l.
>
> qsub qsub.sh will work.
>
> Also, make sure you've added $ideal_load and $max_load to
> mom_priv/config.
>
>>
>> --
>> Raymond Page
>>
>>
>>
>> server_priv/nodes:
>> osgmon.cns.ufl.edu:ts local server production osgmon
>> fidget.cns.ufl.edu:ts np=2 remote workstation fidget
>> rah.cns.ufl.edu:ts remote server rah
>>
>> mom_priv/config:
>> $serverhost localhost
>> $clienthost osgmon.cns.ufl.edu
>> $restricted *.cns.ufl.edu
>> $usecp *.cns.ufl.edu:/home /home
>> $usecp *.cns.ufl.edu:/u /u
>> $logevent 255
>> ideal_load 5
>> max_load 8
>>
>>
>> $ qmgr -c "p s"
>> # Create queues and set their attributes.
>> create queue test
>> set queue test queue_type = Execution
>> set queue test resources_default.nodect = 1
>> set queue test resources_default.nodes = 1
>> set queue test resources_default.walltime = 01:00:00
>> set queue test enabled = True
>> set queue test started = True
>> # Set server attributes.
>> set server scheduling = True
>> set server default_queue = test
>> set server log_events = 511
>> set server mail_from = adm
>> set server scheduler_iteration = 600
>> set server node_ping_rate = 300
>> set server node_check_rate = 600
>> _______________________________________________
>> torqueusers mailing list
>> torqueusers at supercluster.org
>> http://supercluster.org/mailman/listinfo/torqueusers
>
>
>
More information about the torqueusers
mailing list