[torqueusers] wrong pbs server name

Samir Gartner jigzat at gmail.com
Thu May 21 11:23:57 MDT 2009


Ok, I don't see any file named default_server but server_name has the right
server name rufian.perrera.local and there is another file with the same
content named server_name.new.

Righ now the PSB server name apears to be correct (after stoping the server
and manually deletting the zombie jobs) but stil the jobs won't start.


[samir at rufian ~]$ echo "sleep 30;date" | /opt/pbs/bin/qsub
[samir at rufian ~]$ /opt/pbs/bin/qstat -a

rufian.perrera.local:

Req'd  Req'd   Elap
Job ID               Username Queue    Jobname          SessID NDS   TSK
Memory Time  S Time
-------------------- -------- -------- ---------------- ------ ----- ---
------ ----- - -----
13.rufian.perrer     samir    batch    STDIN               --      1  --
--  01:00 Q   --
[samir at rufian ~]$


by the way, is it top posting allowed??

2009/5/21 Jerry Smith <jdsmit at sandia.gov>

> Samir,
>
> What do you have in $PBS_HOME/{server_name,default_server}?
>
> It should be what resolves as the ethernet address that pbs should be
> listening on.
>
> --Jerry
>
>
>
>
> Samir Gartner wrote:
>
>> Ok I finally installed torque under yellowdog/ppc but now I have another
>> problem. I set up my pbs server as rufian.perrera.local but when I issue a
>> job it shows itself in localhost.localdomain and it stays on queued state
>> forever. And if i try to qdel the job it cant reach the server and the
>> conection times out. Any ideas of what could be wrong?
>> I'm not trying to set up anything complicated, is just one machine that
>> works as server and client.
>>
>> this is the shell output
>>
>> [root at rufian bin]# /opt/pbs/bin/qstat -a
>>
>> rufian.perrera.local:
>>
>> Req'd  Req'd   Elap
>> Job ID               Username Queue    Jobname          SessID NDS   TSK
>> Memory Time  S Time
>> -------------------- -------- -------- ---------------- ------ ----- ---
>> ------ ----- - -----
>> 7.localhost.loca     samir    batch    STDIN               --      1  --
>>  --  01:00 Q   --
>> 8.localhost.loca     samir    batch    STDIN               --      1  --
>>  --  01:00 Q   --
>> 9.localhost.loca     samir    batch    STDIN               --      1  --
>>  --  01:00 Q   --
>> 10.localhost.loc     samir    batch    STDIN               --      1  --
>>  --  01:00 Q   --
>> [root at rufian bin]# /opt/pbs/bin/qdel 7.localhost.localdomain
>> Connection timed out
>> qdel: cannot connect to server localhost.localdomain (errno=110)
>> Connection timed out
>> You have new mail in /var/spool/mail/root
>> [root at rufian bin]# /opt/pbs/bin/qdel 7.rufian.perrera.local
>> qdel: Unknown Job Id 7.rufian.perrera.local
>> [root at rufian bin]# su - samir
>> [samir at rufian ~]$ /opt/pbs/bin/qdel 7.localhost.localdomain
>> Connection timed out
>> qdel: cannot connect to server localhost.localdomain (errno=110)
>> Connection timed out
>> [samir at rufian ~]$
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20090521/ec4e1ae8/attachment.html 


More information about the torqueusers mailing list