[torqueusers] wrong pbs server name

Samir Gartner jigzat at gmail.com
Thu May 21 12:42:49 MDT 2009


Ok, scheduling wasn't enabled,now it is, but pbs_sched service was not
found. I didn't install maui, it is a default installation. About hosts
file, it is properly configured as well as nodes and mom's config files.

when I manually start pbs_sched it says

pbs_sched: addclient, host localhost not found


2009/5/21 Samir Gartner <jigzat at gmail.com>

> I think I'm gonna cry.... I love you guys!! No, seriously, it worked but
> only if executed under root user, now the question is what did I do wrong?
> Jobs should start automatically, right?
>
> I was following first the Globus tootlikt tutorial but it is kinda outdated
> so I guess I issued some wrong instructions.
>
> On of the weird things was that the tutorial suggested using the /opt/pbs
> prefix when executing configure and now I have under /opt/pbs again a
> /opt/pbs folder with repeated bin and sbin folders and executables. Is this
> wrong or is how it is supposed to be?
>
> 2009/5/21 Ling C. Ho <ling at fnal.gov>
>
> Have you configured a scheduler?
>>
>> What if you use qrun. Would any job starts?
>>
>> ...
>> ling
>>
>> Samir Gartner wrote:
>>
>>  Ok, I don't see any file named default_server but server_name has the
>>> right server name rufian.perrera.local and there is another file with the
>>> same content named server_name.new.
>>>
>>> Righ now the PSB server name apears to be correct (after stoping the
>>> server and manually deletting the zombie jobs) but stil the jobs won't
>>> start.
>>>
>>>
>>> [samir at rufian ~]$ echo "sleep 30;date" | /opt/pbs/bin/qsub
>>> [samir at rufian ~]$ /opt/pbs/bin/qstat -a
>>>
>>> rufian.perrera.local:
>>>
>>> Req'd  Req'd   Elap
>>> Job ID               Username Queue    Jobname          SessID NDS   TSK
>>> Memory Time  S Time
>>> -------------------- -------- -------- ---------------- ------ ----- ---
>>> ------ ----- - -----
>>> 13.rufian.perrer     samir    batch    STDIN               --      1  --
>>>    --  01:00 Q   --
>>> [samir at rufian ~]$
>>>
>>>
>>> by the way, is it top posting allowed??
>>>
>>> 2009/5/21 Jerry Smith <jdsmit at sandia.gov <mailto:jdsmit at sandia.gov>>
>>>
>>>
>>>    Samir,
>>>
>>>    What do you have in $PBS_HOME/{server_name,default_server}?
>>>
>>>    It should be what resolves as the ethernet address that pbs should
>>>    be listening on.
>>>
>>>    --Jerry
>>>
>>>
>>>
>>>
>>>    Samir Gartner wrote:
>>>
>>>        Ok I finally installed torque under yellowdog/ppc but now I have
>>>        another problem. I set up my pbs server as rufian.perrera.local
>>>        but when I issue a job it shows itself in localhost.localdomain
>>>        and it stays on queued state forever. And if i try to qdel the
>>>        job it cant reach the server and the conection times out. Any
>>>        ideas of what could be wrong?
>>>        I'm not trying to set up anything complicated, is just one
>>>        machine that works as server and client.
>>>
>>>        this is the shell output
>>>
>>>        [root at rufian bin]# /opt/pbs/bin/qstat -a
>>>
>>>        rufian.perrera.local:
>>>
>>>            Req'd  Req'd   Elap
>>>        Job ID               Username Queue    Jobname          SessID
>>>        NDS   TSK Memory Time  S Time
>>>        -------------------- -------- -------- ---------------- ------
>>>        ----- --- ------ ----- - -----
>>>        7.localhost.loca     samir    batch    STDIN               --
>>>         1  --    --  01:00 Q   --
>>>        8.localhost.loca     samir    batch    STDIN               --
>>>         1  --    --  01:00 Q   --
>>>        9.localhost.loca     samir    batch    STDIN               --
>>>         1  --    --  01:00 Q   --
>>>        10.localhost.loc     samir    batch    STDIN               --
>>>         1  --    --  01:00 Q   --
>>>        [root at rufian bin]# /opt/pbs/bin/qdel 7.localhost.localdomain
>>>        Connection timed out
>>>        qdel: cannot connect to server localhost.localdomain (errno=110)
>>>        Connection timed out
>>>        You have new mail in /var/spool/mail/root
>>>        [root at rufian bin]# /opt/pbs/bin/qdel 7.rufian.perrera.local
>>>        qdel: Unknown Job Id 7.rufian.perrera.local
>>>        [root at rufian bin]# su - samir
>>>        [samir at rufian ~]$ /opt/pbs/bin/qdel 7.localhost.localdomain
>>>        Connection timed out
>>>        qdel: cannot connect to server localhost.localdomain (errno=110)
>>>        Connection timed out
>>>        [samir at rufian ~]$
>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------
>>>
>>> _______________________________________________
>>> torqueusers mailing list
>>> torqueusers at supercluster.org
>>> http://www.supercluster.org/mailman/listinfo/torqueusers
>>>
>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20090521/8ecd86b6/attachment.html 


More information about the torqueusers mailing list