[torqueusers] wrong pbs server name

Gus Correa gus at ldeo.columbia.edu
Thu May 21 11:55:03 MDT 2009


PS- Samir, ... and make sure the queue is enabled and started:

qmgr 's q batch enabled = True'
qmgr 's q batch started = True'


Gus Correa

Gus Correa wrote:
> Hi Samir
> 
> Besides Jerry's suggestion on the name resolution:
> 
> 1) Make sure the server has scheduling turned on
> ( ... well, sometimes it is not ...):
> 
> qmgr -c 'list server scheduling' (to check, and if it is not:)
> 
> qmgr -c 'set server scheduling = True'
> 
> 2) Make sure the scheduler daemon is running.
> Assuming you are using the standard pbs_sched:
> 
> service pbs_sched status
> 
> If it is not running:
> 
> service pbs_sched start
> 
> If you use the maui scheduler instead, make sure pbs_sched is NOT running:
> 
> service pbs_sched stop
> 
> and the Maui scheduler is working:
> 
> service maui status (to check and if not up:)
> 
> service maui start
> 
> 3) Can your nodes resolve the server name?
> I.e. from a node "ping rufian.perrera.local" works?
> If not, you may have to include it on /etc/hosts on each node
> (assuming this YellowDog for PPC puts the hosts file there (as RHEL, 
> CentOS, Fedora do)
> 
> 4) Make sure your nodes are listed on the headnode
> file $PBS_HOME/server_priv/nodes, and have the right 
> np=number-of-processors.
> 
> 5) On the nodes check also what $pbsserver you have in 
> $PBS_HOME/mom_priv/config.
> 
> 6) Searching for errors on the system logs may help, on the nodes and on 
> the head.  Here they are on the /var/log/messages file.
> Don't know about YDog.
> 
> 
> I hope this helps,
> Gus Correa
> ---------------------------------------------------------------------
> Gustavo Correa
> Lamont-Doherty Earth Observatory - Columbia University
> Palisades, NY, 10964-8000 - USA
> ---------------------------------------------------------------------
> 
> Jerry Smith wrote:
>> Samir,
>>
>> What do you have in $PBS_HOME/{server_name,default_server}?
>>
>> It should be what resolves as the ethernet address that pbs should be 
>> listening on.
>>
>> --Jerry
>>
>>
>>
>> Samir Gartner wrote:
>>> Ok I finally installed torque under yellowdog/ppc but now I have 
>>> another problem. I set up my pbs server as rufian.perrera.local but 
>>> when I issue a job it shows itself in localhost.localdomain and it 
>>> stays on queued state forever. And if i try to qdel the job it cant 
>>> reach the server and the conection times out. Any ideas of what could 
>>> be wrong?
>>> I'm not trying to set up anything complicated, is just one machine 
>>> that works as server and client.
>>>
>>> this is the shell output
>>>
>>> [root at rufian bin]# /opt/pbs/bin/qstat -a
>>>
>>> rufian.perrera.local:
>>>                                                                          
>>> Req'd  Req'd   Elap
>>> Job ID               Username Queue    Jobname          SessID NDS   
>>> TSK Memory Time  S Time
>>> -------------------- -------- -------- ---------------- ------ ----- 
>>> --- ------ ----- - -----
>>> 7.localhost.loca     samir    batch    STDIN               --      1  
>>> --    --  01:00 Q   --
>>> 8.localhost.loca     samir    batch    STDIN               --      1  
>>> --    --  01:00 Q   --
>>> 9.localhost.loca     samir    batch    STDIN               --      1  
>>> --    --  01:00 Q   --
>>> 10.localhost.loc     samir    batch    STDIN               --      1  
>>> --    --  01:00 Q   --
>>> [root at rufian bin]# /opt/pbs/bin/qdel 7.localhost.localdomain
>>> Connection timed out
>>> qdel: cannot connect to server localhost.localdomain (errno=110) 
>>> Connection timed out
>>> You have new mail in /var/spool/mail/root
>>> [root at rufian bin]# /opt/pbs/bin/qdel 7.rufian.perrera.local
>>> qdel: Unknown Job Id 7.rufian.perrera.local
>>> [root at rufian bin]# su - samir
>>> [samir at rufian ~]$ /opt/pbs/bin/qdel 7.localhost.localdomain
>>> Connection timed out
>>> qdel: cannot connect to server localhost.localdomain (errno=110) 
>>> Connection timed out
>>> [samir at rufian ~]$
>>>
>> _______________________________________________
>> torqueusers mailing list
>> torqueusers at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/torqueusers
> 
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers



More information about the torqueusers mailing list