[torqueusers] wrong pbs server name

Ling C. Ho ling at fnal.gov
Thu May 21 11:34:49 MDT 2009


Have you configured a scheduler?

What if you use qrun. Would any job starts?

...
ling

Samir Gartner wrote:

> Ok, I don't see any file named default_server but server_name has the 
> right server name rufian.perrera.local and there is another file with 
> the same content named server_name.new.
> 
> Righ now the PSB server name apears to be correct (after stoping the 
> server and manually deletting the zombie jobs) but stil the jobs won't 
> start.
> 
> 
> [samir at rufian ~]$ echo "sleep 30;date" | /opt/pbs/bin/qsub
> [samir at rufian ~]$ /opt/pbs/bin/qstat -a
> 
> rufian.perrera.local:
>                                                                          
> Req'd  Req'd   Elap
> Job ID               Username Queue    Jobname          SessID NDS   TSK 
> Memory Time  S Time
> -------------------- -------- -------- ---------------- ------ ----- --- 
> ------ ----- - -----
> 13.rufian.perrer     samir    batch    STDIN               --      1  
> --    --  01:00 Q   --
> [samir at rufian ~]$
> 
> 
> by the way, is it top posting allowed??
> 
> 2009/5/21 Jerry Smith <jdsmit at sandia.gov <mailto:jdsmit at sandia.gov>>
> 
>     Samir,
> 
>     What do you have in $PBS_HOME/{server_name,default_server}?
> 
>     It should be what resolves as the ethernet address that pbs should
>     be listening on.
> 
>     --Jerry
> 
> 
> 
> 
>     Samir Gartner wrote:
> 
>         Ok I finally installed torque under yellowdog/ppc but now I have
>         another problem. I set up my pbs server as rufian.perrera.local
>         but when I issue a job it shows itself in localhost.localdomain
>         and it stays on queued state forever. And if i try to qdel the
>         job it cant reach the server and the conection times out. Any
>         ideas of what could be wrong?
>         I'm not trying to set up anything complicated, is just one
>         machine that works as server and client.
> 
>         this is the shell output
> 
>         [root at rufian bin]# /opt/pbs/bin/qstat -a
> 
>         rufian.perrera.local:
>                                                                        
>                 Req'd  Req'd   Elap
>         Job ID               Username Queue    Jobname          SessID
>         NDS   TSK Memory Time  S Time
>         -------------------- -------- -------- ---------------- ------
>         ----- --- ------ ----- - -----
>         7.localhost.loca     samir    batch    STDIN               --  
>            1  --    --  01:00 Q   --
>         8.localhost.loca     samir    batch    STDIN               --  
>            1  --    --  01:00 Q   --
>         9.localhost.loca     samir    batch    STDIN               --  
>            1  --    --  01:00 Q   --
>         10.localhost.loc     samir    batch    STDIN               --  
>            1  --    --  01:00 Q   --
>         [root at rufian bin]# /opt/pbs/bin/qdel 7.localhost.localdomain
>         Connection timed out
>         qdel: cannot connect to server localhost.localdomain (errno=110)
>         Connection timed out
>         You have new mail in /var/spool/mail/root
>         [root at rufian bin]# /opt/pbs/bin/qdel 7.rufian.perrera.local
>         qdel: Unknown Job Id 7.rufian.perrera.local
>         [root at rufian bin]# su - samir
>         [samir at rufian ~]$ /opt/pbs/bin/qdel 7.localhost.localdomain
>         Connection timed out
>         qdel: cannot connect to server localhost.localdomain (errno=110)
>         Connection timed out
>         [samir at rufian ~]$
> 
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers




More information about the torqueusers mailing list