[torqueusers] PBS and Globus gatekeeper node

Gerson Galang gerson.sapac at gawab.com
Tue Feb 1 21:45:01 MST 2005


Hi Chris,

Chris Samuel wrote:
> On Fri, 28 Jan 2005 11:20 am, Gerson Galang wrote:
> 
> 
>>But if I set "router at localhost" to route jobs to queues on another
>>machine "batch at anothermachine.mydomain.com", PBS won't run the jobs
>>anymore. PBS will tell me "Jobs rejected by all possible destinations"
>>even without me seeing it tried contacting anothermachine.mydomain.com.
> 
> 
> NB: We've never tried this, no idea if it can work or not..
> 
> A few thoughts:
> 
> 1) Is anything logged about rejection on the destination machine ?   Could it 
> be that it's not permitted to queue jobs to that PBS server, or some DNS vs 
> local hostname issues ?

There are no logs on the remote machine. What's weird is that there are 
even no packets being sent out to the remote machine by the local 
machine but it still logs in the local machine that the job has been 
rejected by all possible destinations.

> 
> 2) Does the following command work from your gatekeeper ?
> 
>  qstat -q @anothermachine.mydomain.com

Yup, this works.

> 
> 3) Can you submit a test job to that remote system without using a routing 
> queue ?  For example:
> 
>  echo hostname | qsub -l walltime=0:1:0 -q queue at anothermachine.mydomain.com
> 
> This works here between machines (which suprised me!).

And this one as well.

> 
> 4) Do you see anything on the wire if you run a packet sniffer ?
> 
We've used tcpdump but did not see any packets being sent out to the 
destination machine.

> cheers!
> Chris
> 


Regards,
Gerson


More information about the torqueusers mailing list