[torqueusers] wrong pbs server name
Gus Correa
gus at ldeo.columbia.edu
Thu May 21 11:55:03 MDT 2009
PS- Samir, ... and make sure the queue is enabled and started:
qmgr 's q batch enabled = True'
qmgr 's q batch started = True'
Gus Correa
Gus Correa wrote:
> Hi Samir
>
> Besides Jerry's suggestion on the name resolution:
>
> 1) Make sure the server has scheduling turned on
> ( ... well, sometimes it is not ...):
>
> qmgr -c 'list server scheduling' (to check, and if it is not:)
>
> qmgr -c 'set server scheduling = True'
>
> 2) Make sure the scheduler daemon is running.
> Assuming you are using the standard pbs_sched:
>
> service pbs_sched status
>
> If it is not running:
>
> service pbs_sched start
>
> If you use the maui scheduler instead, make sure pbs_sched is NOT running:
>
> service pbs_sched stop
>
> and the Maui scheduler is working:
>
> service maui status (to check and if not up:)
>
> service maui start
>
> 3) Can your nodes resolve the server name?
> I.e. from a node "ping rufian.perrera.local" works?
> If not, you may have to include it on /etc/hosts on each node
> (assuming this YellowDog for PPC puts the hosts file there (as RHEL,
> CentOS, Fedora do)
>
> 4) Make sure your nodes are listed on the headnode
> file $PBS_HOME/server_priv/nodes, and have the right
> np=number-of-processors.
>
> 5) On the nodes check also what $pbsserver you have in
> $PBS_HOME/mom_priv/config.
>
> 6) Searching for errors on the system logs may help, on the nodes and on
> the head. Here they are on the /var/log/messages file.
> Don't know about YDog.
>
>
> I hope this helps,
> Gus Correa
> ---------------------------------------------------------------------
> Gustavo Correa
> Lamont-Doherty Earth Observatory - Columbia University
> Palisades, NY, 10964-8000 - USA
> ---------------------------------------------------------------------
>
> Jerry Smith wrote:
>> Samir,
>>
>> What do you have in $PBS_HOME/{server_name,default_server}?
>>
>> It should be what resolves as the ethernet address that pbs should be
>> listening on.
>>
>> --Jerry
>>
>>
>>
>> Samir Gartner wrote:
>>> Ok I finally installed torque under yellowdog/ppc but now I have
>>> another problem. I set up my pbs server as rufian.perrera.local but
>>> when I issue a job it shows itself in localhost.localdomain and it
>>> stays on queued state forever. And if i try to qdel the job it cant
>>> reach the server and the conection times out. Any ideas of what could
>>> be wrong?
>>> I'm not trying to set up anything complicated, is just one machine
>>> that works as server and client.
>>>
>>> this is the shell output
>>>
>>> [root at rufian bin]# /opt/pbs/bin/qstat -a
>>>
>>> rufian.perrera.local:
>>>
>>> Req'd Req'd Elap
>>> Job ID Username Queue Jobname SessID NDS
>>> TSK Memory Time S Time
>>> -------------------- -------- -------- ---------------- ------ -----
>>> --- ------ ----- - -----
>>> 7.localhost.loca samir batch STDIN -- 1
>>> -- -- 01:00 Q --
>>> 8.localhost.loca samir batch STDIN -- 1
>>> -- -- 01:00 Q --
>>> 9.localhost.loca samir batch STDIN -- 1
>>> -- -- 01:00 Q --
>>> 10.localhost.loc samir batch STDIN -- 1
>>> -- -- 01:00 Q --
>>> [root at rufian bin]# /opt/pbs/bin/qdel 7.localhost.localdomain
>>> Connection timed out
>>> qdel: cannot connect to server localhost.localdomain (errno=110)
>>> Connection timed out
>>> You have new mail in /var/spool/mail/root
>>> [root at rufian bin]# /opt/pbs/bin/qdel 7.rufian.perrera.local
>>> qdel: Unknown Job Id 7.rufian.perrera.local
>>> [root at rufian bin]# su - samir
>>> [samir at rufian ~]$ /opt/pbs/bin/qdel 7.localhost.localdomain
>>> Connection timed out
>>> qdel: cannot connect to server localhost.localdomain (errno=110)
>>> Connection timed out
>>> [samir at rufian ~]$
>>>
>> _______________________________________________
>> torqueusers mailing list
>> torqueusers at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/torqueusers
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
More information about the torqueusers
mailing list