[torqueusers] Problem: qsub fails to submit jobs

Raymond Page pagerc at ufl.edu
Fri Sep 3 10:40:24 MDT 2004


Joseph Spadavecchia wrote:

> What are the contents of qsub.sh?
> 

#!/bin/bash
case ${PBS_ENVIRONMENT} in
PBS_BATCH)
echo 'Date is: '`date`
echo 'Node is: '`hostname`
pwd
uname -m -r -s
ls -ltr
sleep 60
;;
*)
echo PBS_ENVIRONMENT not set
echo to run job use: pbs $0
;;
esac



> On Thu, 2004-09-02 at 22:58, Raymond Page wrote:
> 
>>I tried what Joseph said but still no steam.  I corrected the 
>>mom_config to have $ideal_load and $max_load with the '$'.  
>>Perhaps someone could explain the error I'm getting from my server 
>>and scheduler logs?  It's not an error, the scheduler just deletes 
>>the new job.  Could this be caused by a 2.6 kernel and enabling 
>>load balancing?
>>
>>
>>fidget $ qsub  qsub.sh
>>503.osgmon.cns.ufl.edu
>>
>>
>>*** /usr/spool/PBS/sched_logs//20040902 ***
>>09/02/2004 17:53:05;0008; pbs_sched;Job;503.osgmon.cns.ufl.edu;Job 
>>Deleted because it would never run
>>09/02/2004 17:53:05;0040; pbs_sched;Job;503.osgmon.cns.ufl.edu;Not 
>>enough of the right type of nodes available
>>
>>
>>*** /usr/spool/PBS/server_logs//20040902 ***
>>...
>>09/02/2004 17:53:05;0100;PBS_Server;Req;;Type commit request 
>>received from osg at fidget.cns.ufl.edu, sock=9
>>09/02/2004 
>>17:53:05;0100;PBS_Server;Job;503.osgmon.cns.ufl.edu;enqueuing into 
>>test, state 1 hop 1
>>09/02/2004 17:53:05;0008;PBS_Server;Job;503.osgmon.cns.ufl.edu;Job 
>>Queued at request of osg at fidget.cns.ufl.edu, owner = 
>>osg at fidget.cns.ufl.edu, job name = qsub.sh, queue = test
>>09/02/2004 
>>17:53:05;0040;PBS_Server;Svr;osgmon.cns.ufl.edu;Scheduler sent 
>>command new
>>...
>>09/02/2004 17:53:05;0100;PBS_Server;Req;;Type deletejob request 
>>received from Scheduler at osgmon.cns.ufl.edu, sock=10
>>09/02/2004 17:53:05;0008;PBS_Server;Job;503.osgmon.cns.ufl.edu;Job 
>>deleted at request of Scheduler at osgmon.cns.ufl.edu
>>09/02/2004 
>>17:53:05;0100;PBS_Server;Job;503.osgmon.cns.ufl.edu;dequeuing from 
>>test, state 5
>>09/02/2004 17:53:05;0100;PBS_Server;Req;;Type modifyjob request 
>>received from Scheduler at osgmon.cns.ufl.edu, sock=10
>>09/02/2004 17:53:05;0080;PBS_Server;Job;;Unknown Job Id
>>09/02/2004 17:53:05;0080;PBS_Server;Req;req_reject;Reject reply 
>>code=15001, aux=0, type=11, from Scheduler at osgmon.cns.ufl.edu
>>
>>
>>
>>On Thu Sep 02 15:12:37 EDT 2004, Joseph Spadavecchia 
>><j.spadavecchia at ed.ac.uk> wrote:
>>
>>
>>>On Thu, 2004-09-02 at 16:12, Raymond Page wrote:
>>>
>>>>fidget $ qsub -l nodes=rah.cns.ufl.edu qsub.sh
>>>>qsub: Job exceeds queue resource limits
>>>>
>>>>I'm attempting to use the below setup and cannot understand why 
>>>>I am receiving the qsub failure above.  I had shared clusters 
>>>>working, and now I want the ability to load balance with 
>>>>timeshared nodes.  I'd appreciate being informed of what I'm 
>>>>missing in my setup to let jobs execute on a timeshared host.
>>>
>>>With timeshared nodes you cannot request exclusive resources with 
>>>-l.
>>>
>>>qsub qsub.sh will work.
>>>
>>>Also, make sure you've added $ideal_load and $max_load to
>>>mom_priv/config.
>>>
>>>
>>>>--
>>>>Raymond Page
>>>>
>>>>
>>>>
>>>>server_priv/nodes:
>>>>osgmon.cns.ufl.edu:ts local server production osgmon
>>>>fidget.cns.ufl.edu:ts np=2 remote workstation fidget
>>>>rah.cns.ufl.edu:ts remote server rah
>>>>
>>>>mom_priv/config:
>>>>$serverhost     localhost
>>>>$clienthost     osgmon.cns.ufl.edu
>>>>$restricted     *.cns.ufl.edu
>>>>$usecp          *.cns.ufl.edu:/home /home
>>>>$usecp          *.cns.ufl.edu:/u /u
>>>>$logevent       255
>>>>ideal_load      5
>>>>max_load        8
>>>>
>>>>
>>>>$ qmgr -c "p s"
>>>># Create queues and set their attributes.
>>>>create queue test
>>>>set queue test queue_type = Execution
>>>>set queue test resources_default.nodect = 1
>>>>set queue test resources_default.nodes = 1
>>>>set queue test resources_default.walltime = 01:00:00
>>>>set queue test enabled = True
>>>>set queue test started = True
>>>># Set server attributes.
>>>>set server scheduling = True
>>>>set server default_queue = test
>>>>set server log_events = 511
>>>>set server mail_from = adm
>>>>set server scheduler_iteration = 600
>>>>set server node_ping_rate = 300
>>>>set server node_check_rate = 600
>>>>_______________________________________________
>>>>torqueusers mailing list
>>>>torqueusers at supercluster.org
>>>>http://supercluster.org/mailman/listinfo/torqueusers
>>>
>>>
>>>
>>
>>
>>
>>_______________________________________________
>>torqueusers mailing list
>>torqueusers at supercluster.org
>>http://supercluster.org/mailman/listinfo/torqueusers
> 
> 
> 



More information about the torqueusers mailing list