[torqueusers] PBS Scheduling Weirdness

Jerry Smith jdsmit at sandia.gov
Wed May 20 11:28:37 MDT 2009


Try:
echo "sleep 10" | qsub -l nodes=node4:ppn=4
or
echo "sleep 10" | qsub -l nodes=1:ppn=4

Does this change anything?

--Jerry

Edsall, William (WJ) wrote:
> I usually test with a STDIN command such as this.
>  
> > echo "sleep 10" | qsub -l nodes=1:node4:ppn=4
>  
> My job runs, but as you can see i only get one cpu, on the wrong 
> resource. This is the same as requesting multiple nodes. This was 
> working and works on our other clusters but as of monday this week it 
> fails.
>  
> > qstat -f 1059
> Job Id: 1059
>     Job_Name = STDIN
>     Job_Owner =  <deleted>
>     job_state = R
>     queue = batch
>     server = <deleted>com
>     Checkpoint = u
>     ctime = Wed May 20 11:46:18 2009
>     Error_Path = <deleted>
> *    exec_host = node2/0*
>     Hold_Types = n
>     Join_Path = n
>     Keep_Files = n
>     Mail_Points = a
>     mtime = Wed May 20 11:46:26 2009
>     Output_Path = <deleted>/STDIN.o1059
>     Priority = 0
>     qtime = Wed May 20 11:46:18 2009
>     Rerunable = True
>     Resource_List.neednodes = 1
>     Resource_List.nodect = 1
>     Resource_List.nodes = 1
>     Resource_List.walltime = 01:00:00
>     session_id = 12814
>     substate = 42
>     Variable_List = PBS_O_HOME=/home/<deleted>,PBS_O_LANG=POSIX,
>         PBS_O_LOGNAME=<deleted>,
>         
> PBS_O_PATH=/usr/local/torque/sbin:/usr/local/torque/bin:/usr/bin:/bin
>         
> :/usr/sbin:/sbin:/usr/local/bin:/usr/bin/X11:/usr/X11R6/bin:/usr/games
>         
> :/opt/kde3/bin:/usr/lib/mit/bin:/usr/lib/mit/sbin:/usr/lib/qt3/bin,
>         PBS_O_MAIL=/var/mail/<deleted>,PBS_O_SHELL=/bin/tcsh,
>         PBS_SERVER=txmerig.nam.dow.com,PBS_O_HOST=txmerig.nam.dow.com,
>         PBS_O_WORKDIR=/home/<deleted>,PBS_O_QUEUE=batch
>     euser = <deleted>
>     egroup = users
>     hashname = 1059.<deleted>.com
>     queue_rank = 996
>     queue_type = E
>     comment = Job started on Wed May 20 at 11:46
>     etime = Wed May 20 11:46:18 2009
>     submit_args = -l nodes=1:node4:ppn=4
>     start_time = Wed May 20 11:46:26 2009
>     start_count = 1
>  
> on other known working clusters, requesting resources in the same 
> fasion works fine as seen here:
>     exec_host = 
> node14/3+node14/2+node14/1+node14/0+node13/3+node13/2+node13/1
>         +node13/0
>
>     ------------------------------------------------------------------------
>     *From:* Jerry Smith [mailto:jdsmit at sandia.gov]
>     *Sent:* Wednesday, May 20, 2009 12:02 PM
>     *To:* Edsall, William (WJ)
>     *Cc:* torqueusers at supercluster.org
>     *Subject:* Re: [torqueusers] PBS Scheduling Weirdness
>
>     Sorry I forgot to ask this as well, can we get a copy of the
>     script you are submitting and the qsub command you are using?
>
>     Jerry
>
>     Edsall, William (WJ) wrote:
>>     Hello,
>>      Here is the output. I'm using the torque scheduler - maui is on
>>     the system but not running.
>>      
>>     # qmgr -c "p s"
>>     #
>>     # Create queues and set their attributes.
>>     #
>>     #
>>     # Create and define queue batch
>>     #
>>     create queue batch
>>     set queue batch queue_type = Execution
>>     set queue batch resources_default.nodes = 1
>>     set queue batch resources_default.walltime = 01:00:00
>>     set queue batch enabled = True
>>     set queue batch started = True
>>     #
>>     # Set server attributes.
>>     #
>>     set server scheduling = True
>>     set server acl_hosts = txmerig
>>     _//stripped out the list of managers and operators_
>>     set server default_queue = batch
>>     set server log_events = 511
>>     set server mail_from = adm
>>     set server scheduler_iteration = 600
>>     set server node_check_rate = 150
>>     set server tcp_timeout = 6
>>     set server next_job_number = 1054
>>
>>         ------------------------------------------------------------------------
>>         *From:* Jerry Smith [mailto:jdsmit at sandia.gov]
>>         *Sent:* Tuesday, May 19, 2009 4:05 PM
>>         *To:* Edsall, William (WJ)
>>         *Cc:* torqueusers at supercluster.org
>>         *Subject:* Re: [torqueusers] PBS Scheduling Weirdness
>>
>>         Can you give us the output from:
>>
>>         qmgr -c "p s"
>>
>>         and are you using any external scheduler, Maui or Moab or the
>>         like?
>>
>>         Thanks,
>>
>>         --Jerry
>>
>>         Edsall, William (WJ) wrote:
>>>
>>>         Hello list,
>>>          Having a strange problem with torque version: 2.4.0b1.
>>>
>>>         It seems that no matter how much resource I request, I only
>>>         get one cpu on the first available node.
>>>
>>>         Please help me brainstorm the possible causes.
>>>
>>>         *_______________________________________*
>>>         William J. Edsall
>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20090520/64ac685d/attachment.html 


More information about the torqueusers mailing list