[torqueusers] wrong pbs server name

Gus Correa gus at ldeo.columbia.edu
Thu May 21 13:07:02 MDT 2009


Samir Gartner wrote:
> Ok, scheduling wasn't enabled,now it is, 

It happens very often.
Fixing it is a good first step.

> but pbs_sched service was not 
> found. 

Starting up daemons in YDog may be different from RHEL, CentOS, Fedora,
so I am just guessing based on the latter. Not familiar to YDog.
Anyway ...

Don't know if you got Torque from ClusterResources or other.
In any case, there should be a pbs_sched script on /etc/init.d
If it is there, do "chkconfig --add pbs_sched" (or YDog equivalent),
then do "chkconfig --list pbs_sched" to see which runlevels it will be 
on, then "service pbs_sched start" to start it, or if YDog doesn't have 
"service", run it with "/etc/init.d/pbs_sched start".

If you don't have the pbs_sched script in /etc/init.d, you may find one
in the contrib subdirectory of the Torque source tree.
Copy it over to /etc/init.d, and do the above.
(The location may be other than /etc/init.d in YDog.)


> I didn't install maui, it is a default installation. About hosts 
> file, it is properly configured as well as nodes and mom's config files.
> 

You only need Maui if you want a complex scheduling policy.
pbs_sched is FIFO, very simple, but works fine.
I've used it for a long time without problems.

> when I manually start pbs_sched it says
> 
> pbs_sched: addclient, host localhost not found
> 

Hmm ... never got this one, not that I remember.
Not sure what you mean by "manually start pbs_sched".
Anyway, sounds as another, different, problem.


Is it possible that your "hostname" command
is not resolving your server name to rufian.perrera.local but to
localhost?
What is the output of "hostname"?
What do you have in /etc/hosts?
What do you have in /etc/sysconfig/network?

Just in case you have  /etc/sysconfig/pbs_server and 
/etc/sysconfig/pbs_sched, what is the contents?
(I don't have them.)

(Again just guessing, YDog may have different files to startup things.)

I hope this helps,
Gus Correa
---------------------------------------------------------------------
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
---------------------------------------------------------------------

> 
> 2009/5/21 Samir Gartner <jigzat at gmail.com <mailto:jigzat at gmail.com>>
> 
>     I think I'm gonna cry.... I love you guys!! No, seriously, it worked
>     but only if executed under root user, now the question is what did I
>     do wrong? Jobs should start automatically, right?
> 
>     I was following first the Globus tootlikt tutorial but it is kinda
>     outdated so I guess I issued some wrong instructions.
> 
>     On of the weird things was that the tutorial suggested using the
>     /opt/pbs prefix when executing configure and now I have under
>     /opt/pbs again a /opt/pbs folder with repeated bin and sbin folders
>     and executables. Is this wrong or is how it is supposed to be?
> 
>     2009/5/21 Ling C. Ho <ling at fnal.gov <mailto:ling at fnal.gov>>
> 
>         Have you configured a scheduler?
> 
>         What if you use qrun. Would any job starts?
> 
>         ...
>         ling
> 
>         Samir Gartner wrote:
> 
>             Ok, I don't see any file named default_server but
>             server_name has the right server name rufian.perrera.local
>             and there is another file with the same content named
>             server_name.new.
> 
>             Righ now the PSB server name apears to be correct (after
>             stoping the server and manually deletting the zombie jobs)
>             but stil the jobs won't start.
> 
> 
>             [samir at rufian ~]$ echo "sleep 30;date" | /opt/pbs/bin/qsub
>             [samir at rufian ~]$ /opt/pbs/bin/qstat -a
> 
>             rufian.perrera.local:
>                                                                        
>                         Req'd  Req'd   Elap
>             Job ID               Username Queue    Jobname        
>              SessID NDS   TSK Memory Time  S Time
>             -------------------- -------- -------- ----------------
>             ------ ----- --- ------ ----- - -----
>             13.rufian.perrer     samir    batch    STDIN              
>             --      1  --    --  01:00 Q   --
>             [samir at rufian ~]$
> 
> 
>             by the way, is it top posting allowed??
> 
>             2009/5/21 Jerry Smith <jdsmit at sandia.gov
>             <mailto:jdsmit at sandia.gov> <mailto:jdsmit at sandia.gov
>             <mailto:jdsmit at sandia.gov>>>
> 
> 
>                Samir,
> 
>                What do you have in $PBS_HOME/{server_name,default_server}?
> 
>                It should be what resolves as the ethernet address that
>             pbs should
>                be listening on.
> 
>                --Jerry
> 
> 
> 
> 
>                Samir Gartner wrote:
> 
>                    Ok I finally installed torque under yellowdog/ppc but
>             now I have
>                    another problem. I set up my pbs server as
>             rufian.perrera.local
>                    but when I issue a job it shows itself in
>             localhost.localdomain
>                    and it stays on queued state forever. And if i try to
>             qdel the
>                    job it cant reach the server and the conection times
>             out. Any
>                    ideas of what could be wrong?
>                    I'm not trying to set up anything complicated, is
>             just one
>                    machine that works as server and client.
> 
>                    this is the shell output
> 
>                    [root at rufian bin]# /opt/pbs/bin/qstat -a
> 
>                    rufian.perrera.local:
>                                                                        
>                                      Req'd  Req'd   Elap
>                    Job ID               Username Queue    Jobname      
>                SessID
>                    NDS   TSK Memory Time  S Time
>                    -------------------- -------- --------
>             ---------------- ------
>                    ----- --- ------ ----- - -----
>                    7.localhost.loca     samir    batch    STDIN        
>                   --             1  --    --  01:00 Q   --
>                    8.localhost.loca     samir    batch    STDIN        
>                   --             1  --    --  01:00 Q   --
>                    9.localhost.loca     samir    batch    STDIN        
>                   --             1  --    --  01:00 Q   --
>                    10.localhost.loc     samir    batch    STDIN        
>                   --             1  --    --  01:00 Q   --
>                    [root at rufian bin]# /opt/pbs/bin/qdel
>             7.localhost.localdomain
>                    Connection timed out
>                    qdel: cannot connect to server localhost.localdomain
>             (errno=110)
>                    Connection timed out
>                    You have new mail in /var/spool/mail/root
>                    [root at rufian bin]# /opt/pbs/bin/qdel
>             7.rufian.perrera.local
>                    qdel: Unknown Job Id 7.rufian.perrera.local
>                    [root at rufian bin]# su - samir
>                    [samir at rufian ~]$ /opt/pbs/bin/qdel
>             7.localhost.localdomain
>                    Connection timed out
>                    qdel: cannot connect to server localhost.localdomain
>             (errno=110)
>                    Connection timed out
>                    [samir at rufian ~]$
> 
> 
> 
> 
>             ------------------------------------------------------------------------
> 
>             _______________________________________________
>             torqueusers mailing list
>             torqueusers at supercluster.org
>             <mailto:torqueusers at supercluster.org>
>             http://www.supercluster.org/mailman/listinfo/torqueusers
> 
> 
> 
> 
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers



More information about the torqueusers mailing list