[torqueusers] wrong pbs server name
Samir Gartner
jigzat at gmail.com
Thu May 21 12:42:49 MDT 2009
Ok, scheduling wasn't enabled,now it is, but pbs_sched service was not
found. I didn't install maui, it is a default installation. About hosts
file, it is properly configured as well as nodes and mom's config files.
when I manually start pbs_sched it says
pbs_sched: addclient, host localhost not found
2009/5/21 Samir Gartner <jigzat at gmail.com>
> I think I'm gonna cry.... I love you guys!! No, seriously, it worked but
> only if executed under root user, now the question is what did I do wrong?
> Jobs should start automatically, right?
>
> I was following first the Globus tootlikt tutorial but it is kinda outdated
> so I guess I issued some wrong instructions.
>
> On of the weird things was that the tutorial suggested using the /opt/pbs
> prefix when executing configure and now I have under /opt/pbs again a
> /opt/pbs folder with repeated bin and sbin folders and executables. Is this
> wrong or is how it is supposed to be?
>
> 2009/5/21 Ling C. Ho <ling at fnal.gov>
>
> Have you configured a scheduler?
>>
>> What if you use qrun. Would any job starts?
>>
>> ...
>> ling
>>
>> Samir Gartner wrote:
>>
>> Ok, I don't see any file named default_server but server_name has the
>>> right server name rufian.perrera.local and there is another file with the
>>> same content named server_name.new.
>>>
>>> Righ now the PSB server name apears to be correct (after stoping the
>>> server and manually deletting the zombie jobs) but stil the jobs won't
>>> start.
>>>
>>>
>>> [samir at rufian ~]$ echo "sleep 30;date" | /opt/pbs/bin/qsub
>>> [samir at rufian ~]$ /opt/pbs/bin/qstat -a
>>>
>>> rufian.perrera.local:
>>>
>>> Req'd Req'd Elap
>>> Job ID Username Queue Jobname SessID NDS TSK
>>> Memory Time S Time
>>> -------------------- -------- -------- ---------------- ------ ----- ---
>>> ------ ----- - -----
>>> 13.rufian.perrer samir batch STDIN -- 1 --
>>> -- 01:00 Q --
>>> [samir at rufian ~]$
>>>
>>>
>>> by the way, is it top posting allowed??
>>>
>>> 2009/5/21 Jerry Smith <jdsmit at sandia.gov <mailto:jdsmit at sandia.gov>>
>>>
>>>
>>> Samir,
>>>
>>> What do you have in $PBS_HOME/{server_name,default_server}?
>>>
>>> It should be what resolves as the ethernet address that pbs should
>>> be listening on.
>>>
>>> --Jerry
>>>
>>>
>>>
>>>
>>> Samir Gartner wrote:
>>>
>>> Ok I finally installed torque under yellowdog/ppc but now I have
>>> another problem. I set up my pbs server as rufian.perrera.local
>>> but when I issue a job it shows itself in localhost.localdomain
>>> and it stays on queued state forever. And if i try to qdel the
>>> job it cant reach the server and the conection times out. Any
>>> ideas of what could be wrong?
>>> I'm not trying to set up anything complicated, is just one
>>> machine that works as server and client.
>>>
>>> this is the shell output
>>>
>>> [root at rufian bin]# /opt/pbs/bin/qstat -a
>>>
>>> rufian.perrera.local:
>>>
>>> Req'd Req'd Elap
>>> Job ID Username Queue Jobname SessID
>>> NDS TSK Memory Time S Time
>>> -------------------- -------- -------- ---------------- ------
>>> ----- --- ------ ----- - -----
>>> 7.localhost.loca samir batch STDIN --
>>> 1 -- -- 01:00 Q --
>>> 8.localhost.loca samir batch STDIN --
>>> 1 -- -- 01:00 Q --
>>> 9.localhost.loca samir batch STDIN --
>>> 1 -- -- 01:00 Q --
>>> 10.localhost.loc samir batch STDIN --
>>> 1 -- -- 01:00 Q --
>>> [root at rufian bin]# /opt/pbs/bin/qdel 7.localhost.localdomain
>>> Connection timed out
>>> qdel: cannot connect to server localhost.localdomain (errno=110)
>>> Connection timed out
>>> You have new mail in /var/spool/mail/root
>>> [root at rufian bin]# /opt/pbs/bin/qdel 7.rufian.perrera.local
>>> qdel: Unknown Job Id 7.rufian.perrera.local
>>> [root at rufian bin]# su - samir
>>> [samir at rufian ~]$ /opt/pbs/bin/qdel 7.localhost.localdomain
>>> Connection timed out
>>> qdel: cannot connect to server localhost.localdomain (errno=110)
>>> Connection timed out
>>> [samir at rufian ~]$
>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------
>>>
>>> _______________________________________________
>>> torqueusers mailing list
>>> torqueusers at supercluster.org
>>> http://www.supercluster.org/mailman/listinfo/torqueusers
>>>
>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20090521/8ecd86b6/attachment.html
More information about the torqueusers
mailing list