[torqueusers] my new pbs server is not working

Gus Correa gus at ldeo.columbia.edu
Tue Mar 9 08:57:58 MST 2010


Hi Shibo

Somehow your "slave" computer
doesn't see /home/kuang/sharpbend/s1/r8,
although it can be seen by the "master" computer.
It may be one of several things,
it is hard to tell exactly with the information you gave,
but here are some guesses.

Do you really have a separate /home/kuang/sharpbend/s1/r8
on your "slave" computer, or is it only present in the "master"?
You can login to the "slave" and check this directly
("ls home/kuang/sharpbend/s1/r8").
If the directory is not there,
this is not really a Torque or MPI problem,
but a Sys Admin problem with exporting and mounting directories.

If that directory exists only on the master side,
you can either create an identical copy on the "slave" side (painful),
or use NFS to export it from the "master" computer to the "slave" (easier).

For the second approach, you need to export the /home or /home/kuang
on the "master" computer, and automount it on the "slave" computer.
The files you need to edit are /etc/exports (master side),
and /etc/auto.master plus perhaps /etc/auto.home (slave side).

A bit different approach (not using the automounter),
is just to hard mount /home or /home/kuang
on the "slave" side by adding it to the /etc/fstab list.

You also need to turn on the NFS daemon on the "master" node with
"chkconfig", if it is not yet turned on.

Read the man pages!
At least read "man exportfs", "man mountd", "man fstab",
and "man chkconfig".

You may need to reboot the computers for this to take effect.
Then login to the "slave" and try again
"ls home/kuang/sharpbend/s1/r8".

I hope this helps.
Gus Correa
---------------------------------------------------------------------
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
---------------------------------------------------------------------

shibo kuang wrote:
> "/home/kuang/sharpbend/s1/r8: No such file or directory."
> my node does  not have the directory, but my master has it.
>  
> 
> 
> On Sun, Mar 7, 2010 at 1:09 AM, shibo kuang <s.b.kuang at gmail.com 
> <mailto:s.b.kuang at gmail.com>> wrote:
> 
>     Hi,
>     I just fix the problem using password  free between the computing
>     node and the master.
>     But now i got another problem:
>     in r8.e19, it says
>     /home/kuang/sharpbend/s1/r8: No such file or directory.
>     if only one computer is used, the sever can work normally.
>     Where is missed by me when I install the torque?
>     Your help would be greatly appreciated.
>     Cheers,
>     Shibo Kuang
> 
> 
>      
>     On Sun, Mar 7, 2010 at 12:46 AM, shibo kuang <s.b.kuang at gmail.com
>     <mailto:s.b.kuang at gmail.com>> wrote:
> 
>         Hi all,
>         I tried to install a pbs server for my two centos linux
>         computers (each have 8 cores), but failed..
>         Here is my problem:
>         if i treat one computer as master for runnig pbs_server, as well
>         as a computing node. I can submit jobs using script without any
>         problem. All jobs give the exact results. 
>         However, when one computer is treated as a master, and
>         another is a compting node. jobs ara never submitted sucessfully.
>         I would appreciate your hints and suggestions according the
>         following prompts i got.
>         Regards,
>         Shibo Kuang
>          
>         Return-Path: <adm at master <mailto:adm at master>>
>         Received: from master (localhost [127.0.0.1])
>                 by master (8.13.1/8.13.1) with ESMTP id o26DwKF9006310
>                 for <kuang at master <mailto:kuang at master>>; Sun, 7 Mar
>         2010 00:28:20 +1030
>         Received: (from root at localhost <mailto:root at localhost>)
>                 by master (8.13.1/8.13.1/Submit) id o26DwKpZ006293
>                 for kuang at master <mailto:kuang at master>; Sun, 7 Mar 2010
>         00:28:20 +1030
>         Date: Sun, 7 Mar 2010 00:28:20 +1030
>         From: adm <adm at master <mailto:adm at master>>
>         Message-Id: <201003061358.o26DwKpZ006293 at master
>         <mailto:201003061358.o26DwKpZ006293 at master>>
>         To: kuang at master <mailto:kuang at master>
>         Subject: PBS JOB 18.master
>         Precedence: bulk
>         PBS Job Id: 18.master
>         Job Name:   r8
>         Exec host:  par1/0
>         An error has occurred processing your job, see below.
>         Post job file processing error; job 18.master on host par1/0
>         Unable to copy file /var/spool/torque/spool/18.master.OU to
>         kuang at master:/home/kuang/sharpbend/s1/r8/r8.o18
>         <mailto:kuang at master:/home/kuang/sharpbend/s1/r8/r8.o18>
>         *** error from copy
>         Permission denied (publickey,gssapi-with-mic,password).
>         lost connection
>         *** end error output
>         Output retained on that host in:
>         /var/spool/torque/undelivered/18.master.OU
>         Unable to copy file /var/spool/torque/spool/18.master.ER
>         <http://18.master.er/> to
>         kuang at master:/home/kuang/sharpbend/s1/r8/r8.e18
>         <mailto:kuang at master:/home/kuang/sharpbend/s1/r8/r8.e18>
>         *** error from copy
>         Permission denied (publickey,gssapi-with-mic,password).
>         lost connection
>         *** end error output
>         Output retained on that host in:
>         /var/spool/torque/undelivered/18.master.ER <http://18.master.er/> 
> 
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers



More information about the torqueusers mailing list