[torqueusers] Jobs won't start on free time-shared nodes

Nobuyuki Yamaguchi nyama at opentech.co.jp
Sun Nov 1 14:43:03 MST 2009


Hi Ken,

I submitted the jobs by 'qsub' for both, single and parallel jobs.
But the preblem I've reported previously was happened in running
single-task jobs.  I've not yet tried to run parallel jobs.

Anyway, there are big limitations for running jobs on time-shared
nodes, aren't there?  I'll give up to use time-shared nodes in my
cluster.

Thank you.

Nobu
--

From: Ken Nielson <knielson at adaptivecomputing.com>
Subject: Re: [torqueusers] Jobs won't start on free time-shared nodes
Date: Thu, 29 Oct 2009 09:07:51 -0600
Message-ID: <4AE9AFC7.1000601 at adaptivecomputing.com>

> How are you starting the job?
> 
> If you are using qrun you must use the -H option and designate the
> name of the host where the job will run. Also note that time-shared
> nodes will not run parallel jobs. They will only execute single node
> jobs.
> 
> Ken Nielson
> Adaptive Computing
> 
> 
> Nobuyuki Yamaguchi wrote:
>> Hi,
>>
>> I'm trying to execute jobs on time-shared nodes on keeping equal load
>> avarage with pbs_sched on Torque v.2.3.7.  But at first the jobs won't
>> start on the nodes even though their status are shown free and load
>> averages are apparently low.  And then, the queued jobs runs
>> occasionally... I cannot figure out what bring them to be executable.
>>
>> Anyone give me a clue?
>>
>> I have only two computing nodes and a server.
>> My config files are as follows;
>>
>> ---
>>   
>>> cat server_priv/nodes
>>>     
>> linux-q2c1:ts
>> linux-ht0b:ts
>>
>> ---
>>   
>>> cat mom_priv/config
>>>     
>> $pbsserver  linux-ek2n
>> $logevent   255
>> $max_load   1.2
>> $ideal_load 1.0
>>
>> ---
>>   
>>> qmgr -c 'p s'
>>>     
>> #
>> # Create queues and set their attributes.
>> #
>> #
>> # Create and define queue test_que
>> #
>> create queue test_que
>> set queue test_que queue_type = Execution
>> set queue test_que enabled = True
>> set queue test_que started = True
>> #
>> # Set server attributes.
>> #
>> set server scheduling = True
>> set server acl_hosts = linux-ek2n
>> set server default_queue = test_que
>> set server log_events = 511
>> set server mail_from = adm
>> set server scheduler_iteration = 30
>> set server node_check_rate = 150
>> set server tcp_timeout = 6
>> set server next_job_number = 627
>>
>> ---
>>   
>>> cat sched_priv/sched_config
>>>     
>> round_robin: False	all
>>
>> by_queue: True		prime
>> by_queue: True		non_prime
>>
>> strict_fifo: false	ALL
>>
>> fair_share: false	ALL
>>
>> help_starving_jobs	true	ALL
>>
>> sort_queues	true	ALL
>>
>> load_balancing: true	ALL
>>
>> sort_by: no_sort 	ALL
>>
>> log_filter: 256
>>
>> dedicated_prefix: ded
>>
>> max_starve: 24:00:00
>>
>> half_life: 24:00:00
>>
>> unknown_shares: 10
>>
>> sync_time: 1:00:00
>>
>> ---
>>   
>>> pbsnodes -a
>>>     
>> linux-q2c1
>>      state = free
>>      np = 1
>>      ntype = time-shared
>>      status = opsys=linux,uname=Linux linux-q2c1 2.6.26.8-denx #4 Thu Dec
>>      25 00:02:37 JST 2008 ppc,sessions=2252 2387 2452 2455 2462 2471 2515
>>      2522,nsessions=8,nusers=1,idletime=8066,totmem=2876892kb,availmem=2744104kb,physmem=772388kb,ncpus=1,loadave=0.05,netload=121743595,state=free,jobs=,varattr=,rectime=1256801528
>>
>> linux-ht0b
>>      state = free
>>      np = 1
>>      ntype = time-shared
>>      status = opsys=linux,uname=Linux linux-ht0b 2.6.26.8 #1 Tue Jun 16
>>      16:55:24 JST 2009 ppc,sessions=18837 18972 19007 19042 19046 19053
>>      19060 19067 19113 27713
>>      32105,nsessions=11,nusers=1,idletime=60,totmem=2876892kb,availmem=2732068kb,physmem=772388kb,ncpus=1,loadave=0.08,netload=162169974,state=free,jobs=,varattr=,rectime=1256801529
>>
>>
>> Thank you.
>>
>> Nobuyuki Yamaguchi
>> --
>> _______________________________________________
>> torqueusers mailing list
>> torqueusers at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/torqueusers
>>   
> 


More information about the torqueusers mailing list