[torqueusers] PBS Scheduling Weirdness
Jerry Smith
jdsmit at sandia.gov
Wed May 20 11:28:37 MDT 2009
Try:
echo "sleep 10" | qsub -l nodes=node4:ppn=4
or
echo "sleep 10" | qsub -l nodes=1:ppn=4
Does this change anything?
--Jerry
Edsall, William (WJ) wrote:
> I usually test with a STDIN command such as this.
>
> > echo "sleep 10" | qsub -l nodes=1:node4:ppn=4
>
> My job runs, but as you can see i only get one cpu, on the wrong
> resource. This is the same as requesting multiple nodes. This was
> working and works on our other clusters but as of monday this week it
> fails.
>
> > qstat -f 1059
> Job Id: 1059
> Job_Name = STDIN
> Job_Owner = <deleted>
> job_state = R
> queue = batch
> server = <deleted>com
> Checkpoint = u
> ctime = Wed May 20 11:46:18 2009
> Error_Path = <deleted>
> * exec_host = node2/0*
> Hold_Types = n
> Join_Path = n
> Keep_Files = n
> Mail_Points = a
> mtime = Wed May 20 11:46:26 2009
> Output_Path = <deleted>/STDIN.o1059
> Priority = 0
> qtime = Wed May 20 11:46:18 2009
> Rerunable = True
> Resource_List.neednodes = 1
> Resource_List.nodect = 1
> Resource_List.nodes = 1
> Resource_List.walltime = 01:00:00
> session_id = 12814
> substate = 42
> Variable_List = PBS_O_HOME=/home/<deleted>,PBS_O_LANG=POSIX,
> PBS_O_LOGNAME=<deleted>,
>
> PBS_O_PATH=/usr/local/torque/sbin:/usr/local/torque/bin:/usr/bin:/bin
>
> :/usr/sbin:/sbin:/usr/local/bin:/usr/bin/X11:/usr/X11R6/bin:/usr/games
>
> :/opt/kde3/bin:/usr/lib/mit/bin:/usr/lib/mit/sbin:/usr/lib/qt3/bin,
> PBS_O_MAIL=/var/mail/<deleted>,PBS_O_SHELL=/bin/tcsh,
> PBS_SERVER=txmerig.nam.dow.com,PBS_O_HOST=txmerig.nam.dow.com,
> PBS_O_WORKDIR=/home/<deleted>,PBS_O_QUEUE=batch
> euser = <deleted>
> egroup = users
> hashname = 1059.<deleted>.com
> queue_rank = 996
> queue_type = E
> comment = Job started on Wed May 20 at 11:46
> etime = Wed May 20 11:46:18 2009
> submit_args = -l nodes=1:node4:ppn=4
> start_time = Wed May 20 11:46:26 2009
> start_count = 1
>
> on other known working clusters, requesting resources in the same
> fasion works fine as seen here:
> exec_host =
> node14/3+node14/2+node14/1+node14/0+node13/3+node13/2+node13/1
> +node13/0
>
> ------------------------------------------------------------------------
> *From:* Jerry Smith [mailto:jdsmit at sandia.gov]
> *Sent:* Wednesday, May 20, 2009 12:02 PM
> *To:* Edsall, William (WJ)
> *Cc:* torqueusers at supercluster.org
> *Subject:* Re: [torqueusers] PBS Scheduling Weirdness
>
> Sorry I forgot to ask this as well, can we get a copy of the
> script you are submitting and the qsub command you are using?
>
> Jerry
>
> Edsall, William (WJ) wrote:
>> Hello,
>> Here is the output. I'm using the torque scheduler - maui is on
>> the system but not running.
>>
>> # qmgr -c "p s"
>> #
>> # Create queues and set their attributes.
>> #
>> #
>> # Create and define queue batch
>> #
>> create queue batch
>> set queue batch queue_type = Execution
>> set queue batch resources_default.nodes = 1
>> set queue batch resources_default.walltime = 01:00:00
>> set queue batch enabled = True
>> set queue batch started = True
>> #
>> # Set server attributes.
>> #
>> set server scheduling = True
>> set server acl_hosts = txmerig
>> _//stripped out the list of managers and operators_
>> set server default_queue = batch
>> set server log_events = 511
>> set server mail_from = adm
>> set server scheduler_iteration = 600
>> set server node_check_rate = 150
>> set server tcp_timeout = 6
>> set server next_job_number = 1054
>>
>> ------------------------------------------------------------------------
>> *From:* Jerry Smith [mailto:jdsmit at sandia.gov]
>> *Sent:* Tuesday, May 19, 2009 4:05 PM
>> *To:* Edsall, William (WJ)
>> *Cc:* torqueusers at supercluster.org
>> *Subject:* Re: [torqueusers] PBS Scheduling Weirdness
>>
>> Can you give us the output from:
>>
>> qmgr -c "p s"
>>
>> and are you using any external scheduler, Maui or Moab or the
>> like?
>>
>> Thanks,
>>
>> --Jerry
>>
>> Edsall, William (WJ) wrote:
>>>
>>> Hello list,
>>> Having a strange problem with torque version: 2.4.0b1.
>>>
>>> It seems that no matter how much resource I request, I only
>>> get one cpu on the first available node.
>>>
>>> Please help me brainstorm the possible causes.
>>>
>>> *_______________________________________*
>>> William J. Edsall
>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20090520/64ac685d/attachment.html
More information about the torqueusers
mailing list