[torqueusers] $PBS_NODEFILE

rcbord at wm.edu rcbord at wm.edu
Wed Oct 24 17:33:26 MDT 2007


Hi Aaron,
   Something happened with the pbs_server, pbs_scheduler or pbs_mom.
We restarted all three and it started working again.

but to answer your question.
#PBS -l walltime=00:10:00
#PBS -l nodes=2:pinecone:ppn=2

It would run a two processor job or a serial job but only on the first
compute node which in this case pc02.  The $PBS_NODEFILE with the above
resource request would only have "pc02 pc02" in it.


Chris Bording
Application Analyst
High Performance Computing Group 
Information Technology
The College of William and Mary
(757)-221-3488
rcbord at wm.edu

On Wed, 24 Oct 2007, Aaron Knister wrote:

> So jobs aren't running? What syntax are you using to request resources for 
> submitted jobs?
>
> -Aaron
>
> On Oct 24, 2007, at 11:18 AM, rcbord at wm.edu wrote:
>
>> Hi all,
>>   Ok we installed torque-2.1.9 and had it running on Monday, but now it is 
>> not working correctly.  The $PBS_NODEFILE only add has two processors in 
>> it.
>> 
>> The server_priv/node file has
>> 
>> pc02 np=2
>> pc03 np=2
>> .
>> .
>> pc13 np=2
>> 
>> all the nodes are "free" according to the pbsnodes -a output
>> 
>> pc12
>>      state = free
>>      np = 2
>>      properties = pinecone,v20z,score
>>      ntype = cluster
>>      status = opsys=linux,uname=Linux pc12 2.6.16.53-0.8-smp #1 SMP Fri 
>> Aug 31 13:07:27 UTC 2007 
>> x86_64,sessions=4144,nsessions=1,nusers=1,idletime=167833,totmem=79389 
>> 36kb,
>> availmem=7732980kb,physmem=3737980kb,ncpus=2,loadave=0.00,netload=2475 
>> 13913,
>> state=free,jobs=? 15201,rectime=1193238080
>> 
>> 
>> #
>> # Set server attributes.
>> #
>> set server scheduling = True
>> set server default_queue = submit
>> set server log_events = 127
>> set server mail_from = adm
>> set server max_running = 24
>> set server max_user_run = 24
>> set server max_group_run =24
>> set server acl_host_enable = True
>> set server acl_hosts = pinecone.cwm.edu
>> set server acl_hosts += pc02
>> set server acl_hosts += pc03
>> set server acl_hosts += pc04
>> set server acl_hosts += pc05
>> set server acl_hosts += pc06
>> set server acl_hosts += pc07
>> set server acl_hosts += pc08
>> set server acl_hosts += pc09
>> set server acl_hosts += pc10
>> set server acl_hosts += pc11
>> set server acl_hosts += pc12
>> set server acl_hosts += pc13
>> set server query_other_jobs = True
>> set server acl_roots = root at pinecone.cwm.edu
>> set server managers = manager1 at pinecone.cwm.edu
>> set server operators = manager2 at pinecone.cwm.edu
>> set server operators += manager1 at pinecone.cwm.edu
>> set server resources_available.ncpus = 24
>> set server resources_available.nodect = 12
>> set server resources_max.nodect = 12
>> set server scheduler_iteration = 30
>> set server node_check_rate = 150
>> set server tcp_timeout = 6
>> set server log_level = 5
>> set server pbs_version = 2.1.9
>> 
>> I don't think we have changed anything it just quit!!
>> 
>> 
>> Chris Bording
>> Application Analyst
>> High Performance Computing Group Information Technology
>> The College of William and Mary
>> (757)-221-3488
>> rcbord at wm.edu
>> _______________________________________________
>> torqueusers mailing list
>> torqueusers at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/torqueusers
>
> Aaron Knister
> Associate Systems Administrator/Web Designer
> Center for Research on Environment and Water
>
> (301) 595-7001
> aaron at iges.org
>
>
>
>


More information about the torqueusers mailing list