[torqueusers] Jobs won't start on free time-shared nodes
Nobuyuki Yamaguchi
nyama at opentech.co.jp
Sun Nov 1 14:43:03 MST 2009
Hi Ken,
I submitted the jobs by 'qsub' for both, single and parallel jobs.
But the preblem I've reported previously was happened in running
single-task jobs. I've not yet tried to run parallel jobs.
Anyway, there are big limitations for running jobs on time-shared
nodes, aren't there? I'll give up to use time-shared nodes in my
cluster.
Thank you.
Nobu
--
From: Ken Nielson <knielson at adaptivecomputing.com>
Subject: Re: [torqueusers] Jobs won't start on free time-shared nodes
Date: Thu, 29 Oct 2009 09:07:51 -0600
Message-ID: <4AE9AFC7.1000601 at adaptivecomputing.com>
> How are you starting the job?
>
> If you are using qrun you must use the -H option and designate the
> name of the host where the job will run. Also note that time-shared
> nodes will not run parallel jobs. They will only execute single node
> jobs.
>
> Ken Nielson
> Adaptive Computing
>
>
> Nobuyuki Yamaguchi wrote:
>> Hi,
>>
>> I'm trying to execute jobs on time-shared nodes on keeping equal load
>> avarage with pbs_sched on Torque v.2.3.7. But at first the jobs won't
>> start on the nodes even though their status are shown free and load
>> averages are apparently low. And then, the queued jobs runs
>> occasionally... I cannot figure out what bring them to be executable.
>>
>> Anyone give me a clue?
>>
>> I have only two computing nodes and a server.
>> My config files are as follows;
>>
>> ---
>>
>>> cat server_priv/nodes
>>>
>> linux-q2c1:ts
>> linux-ht0b:ts
>>
>> ---
>>
>>> cat mom_priv/config
>>>
>> $pbsserver linux-ek2n
>> $logevent 255
>> $max_load 1.2
>> $ideal_load 1.0
>>
>> ---
>>
>>> qmgr -c 'p s'
>>>
>> #
>> # Create queues and set their attributes.
>> #
>> #
>> # Create and define queue test_que
>> #
>> create queue test_que
>> set queue test_que queue_type = Execution
>> set queue test_que enabled = True
>> set queue test_que started = True
>> #
>> # Set server attributes.
>> #
>> set server scheduling = True
>> set server acl_hosts = linux-ek2n
>> set server default_queue = test_que
>> set server log_events = 511
>> set server mail_from = adm
>> set server scheduler_iteration = 30
>> set server node_check_rate = 150
>> set server tcp_timeout = 6
>> set server next_job_number = 627
>>
>> ---
>>
>>> cat sched_priv/sched_config
>>>
>> round_robin: False all
>>
>> by_queue: True prime
>> by_queue: True non_prime
>>
>> strict_fifo: false ALL
>>
>> fair_share: false ALL
>>
>> help_starving_jobs true ALL
>>
>> sort_queues true ALL
>>
>> load_balancing: true ALL
>>
>> sort_by: no_sort ALL
>>
>> log_filter: 256
>>
>> dedicated_prefix: ded
>>
>> max_starve: 24:00:00
>>
>> half_life: 24:00:00
>>
>> unknown_shares: 10
>>
>> sync_time: 1:00:00
>>
>> ---
>>
>>> pbsnodes -a
>>>
>> linux-q2c1
>> state = free
>> np = 1
>> ntype = time-shared
>> status = opsys=linux,uname=Linux linux-q2c1 2.6.26.8-denx #4 Thu Dec
>> 25 00:02:37 JST 2008 ppc,sessions=2252 2387 2452 2455 2462 2471 2515
>> 2522,nsessions=8,nusers=1,idletime=8066,totmem=2876892kb,availmem=2744104kb,physmem=772388kb,ncpus=1,loadave=0.05,netload=121743595,state=free,jobs=,varattr=,rectime=1256801528
>>
>> linux-ht0b
>> state = free
>> np = 1
>> ntype = time-shared
>> status = opsys=linux,uname=Linux linux-ht0b 2.6.26.8 #1 Tue Jun 16
>> 16:55:24 JST 2009 ppc,sessions=18837 18972 19007 19042 19046 19053
>> 19060 19067 19113 27713
>> 32105,nsessions=11,nusers=1,idletime=60,totmem=2876892kb,availmem=2732068kb,physmem=772388kb,ncpus=1,loadave=0.08,netload=162169974,state=free,jobs=,varattr=,rectime=1256801529
>>
>>
>> Thank you.
>>
>> Nobuyuki Yamaguchi
>> --
>> _______________________________________________
>> torqueusers mailing list
>> torqueusers at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/torqueusers
>>
>
More information about the torqueusers
mailing list