[torqueusers] Fwd: wrong pbs server name
Samir Gartner
jigzat at gmail.com
Thu May 21 12:20:39 MDT 2009
I think I'm gonna cry.... I love you guys!! No, seriously, it worked but
only if executed under root user, now the question is what did I do wrong?
Jobs should start automatically, right?
I was following first the Globus tootlikt tutorial but it is kinda outdated
so I guess I issued some wrong instructions.
On of the weird things was that the tutorial suggested using the /opt/pbs
prefix when executing configure and now I have under /opt/pbs again a
/opt/pbs folder with repeated bin and sbin folders and executables. Is this
wrong or is how it is supposed to be?
2009/5/21 Ling C. Ho <ling at fnal.gov>
Have you configured a scheduler?
>
> What if you use qrun. Would any job starts?
>
> ...
> ling
>
> Samir Gartner wrote:
>
> Ok, I don't see any file named default_server but server_name has the
>> right server name rufian.perrera.local and there is another file with the
>> same content named server_name.new.
>>
>> Righ now the PSB server name apears to be correct (after stoping the
>> server and manually deletting the zombie jobs) but stil the jobs won't
>> start.
>>
>>
>> [samir at rufian ~]$ echo "sleep 30;date" | /opt/pbs/bin/qsub
>> [samir at rufian ~]$ /opt/pbs/bin/qstat -a
>>
>> rufian.perrera.local:
>>
>> Req'd Req'd Elap
>> Job ID Username Queue Jobname SessID NDS TSK
>> Memory Time S Time
>> -------------------- -------- -------- ---------------- ------ ----- ---
>> ------ ----- - -----
>> 13.rufian.perrer samir batch STDIN -- 1 --
>> -- 01:00 Q --
>> [samir at rufian ~]$
>>
>>
>> by the way, is it top posting allowed??
>>
>> 2009/5/21 Jerry Smith <jdsmit at sandia.gov <mailto:jdsmit at sandia.gov>>
>>
>>
>> Samir,
>>
>> What do you have in $PBS_HOME/{server_name,default_server}?
>>
>> It should be what resolves as the ethernet address that pbs should
>> be listening on.
>>
>> --Jerry
>>
>>
>>
>>
>> Samir Gartner wrote:
>>
>> Ok I finally installed torque under yellowdog/ppc but now I have
>> another problem. I set up my pbs server as rufian.perrera.local
>> but when I issue a job it shows itself in localhost.localdomain
>> and it stays on queued state forever. And if i try to qdel the
>> job it cant reach the server and the conection times out. Any
>> ideas of what could be wrong?
>> I'm not trying to set up anything complicated, is just one
>> machine that works as server and client.
>>
>> this is the shell output
>>
>> [root at rufian bin]# /opt/pbs/bin/qstat -a
>>
>> rufian.perrera.local:
>>
>> Req'd Req'd Elap
>> Job ID Username Queue Jobname SessID
>> NDS TSK Memory Time S Time
>> -------------------- -------- -------- ---------------- ------
>> ----- --- ------ ----- - -----
>> 7.localhost.loca samir batch STDIN --
>> 1 -- -- 01:00 Q --
>> 8.localhost.loca samir batch STDIN --
>> 1 -- -- 01:00 Q --
>> 9.localhost.loca samir batch STDIN --
>> 1 -- -- 01:00 Q --
>> 10.localhost.loc samir batch STDIN --
>> 1 -- -- 01:00 Q --
>> [root at rufian bin]# /opt/pbs/bin/qdel 7.localhost.localdomain
>> Connection timed out
>> qdel: cannot connect to server localhost.localdomain (errno=110)
>> Connection timed out
>> You have new mail in /var/spool/mail/root
>> [root at rufian bin]# /opt/pbs/bin/qdel 7.rufian.perrera.local
>> qdel: Unknown Job Id 7.rufian.perrera.local
>> [root at rufian bin]# su - samir
>> [samir at rufian ~]$ /opt/pbs/bin/qdel 7.localhost.localdomain
>> Connection timed out
>> qdel: cannot connect to server localhost.localdomain (errno=110)
>> Connection timed out
>> [samir at rufian ~]$
>>
>>
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> torqueusers mailing list
>> torqueusers at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/torqueusers
>>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20090521/ca52681f/attachment.html
More information about the torqueusers
mailing list