[torqueusers] .OU and .ER files not being sent to user
Adil Mughal
adil.m.mughal at gmail.com
Tue Feb 26 11:07:19 MST 2008
Hello to any Torque users out there
Perhaps you can help me with this problem I have been struggling with
- I am unfortunately not very experienced with Linux let alone Torque
- so I am sure the answer to this is painfully obvious but I can't
personally understand what I am doing wrong
At the moment I am unable to have the .OU and .ER files copied to the
directory I would like them to go to.
I have an nfs mounted system - and as Garrick kindly pointed out the
files fail to copy because I have not configured my mom_priv/config
file properly.
If anyone can give me a detailed response as to what I should be
putting in my mom_priv/config file I would be eternally grateful!!!
Here are the details of my system
(1) I have nfs running and if from the master I type
>df
then this is what I get
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda1 9920592 670548 8737976 8% /
/dev/sda4 249215768 13355476 222996648 6% /data
/dev/sda2 39674224 3358824 34267516 9% /usr
tmpfs 1032344 0 1032344 0% /dev/shm
dphpc1001.dph.aber.ac.uk:/data
249216000 1285120 235067136 1% /data01
dphpc1002.dph.aber.ac.uk:/data
249216000 1377792 234974464 1% /data02
(2) In my mom_priv/config file I have the following:
$pbsserver dphpc1011.dph.aber.ac.uk
$usecp dphpc1011.dph.aber.ac.uk:/users/guest1 /users/guest1
$logevent 255
(3) PLEASE NOTE: I have the following symbolic links set up under the
directory /users
lrwxrwxrwx 1 root root 12 2008-01-28 14:03 guest1 -> /data/guest1
lrwxrwxrwx 1 root root 12 2008-01-28 14:03 guest2 -> /data/guest2
(4) I get the following types of error messages mailed to me
PBS Job Id: 168.dphpc1011.dph.aber.ac.uk
Job Name: STDIN
An error has occurred processing your job, see below.
Post job file processing error; job 168.dphpc1011.dph.aber.ac.uk on
host dphpc1002.dph.aber.ac.uk/1
Unable to copy file /var/spool/torque/spool/168.dphpc10.OU to
guest1 at dphpc1011.dph.aber.ac.uk:/data01/guest1/STDIN.o168
>>> error from copy
Host key verification failed.
lost connection
>>> end error output
Output retained on that host in: /var/spool/torque/undelivered/168.dphpc10.OU
Unable to copy file /var/spool/torque/spool/168.dphpc10.ER to
guest1 at dphpc1011.dph.aber.ac.uk:/data01/guest1/STDIN.e168
>>> error from copy
Host key verification failed.
lost connection
>>> end error output
Output retained on that host in: /var/spool/torque/undelivered/168.dphpc10.ER
best wishes
adil
On Mon, Feb 25, 2008 at 5:30 PM, Adil Mughal <adil.m.mughal at gmail.com> wrote:
> Dear Garrick (and any other Torque users)
>
> Sorry I am still a bit in the dark: I tried changing the line in my
> mom_priv/config file to:
>
> $usecp dphpc1011.dph.aber.ac.uk:/data01/guest1/ /data01/guest1/
>
> but this still did not work - is this what you meant by my destination
> paths not matching
>
> thanks in advance
>
> adil
>
>
>
>
>
> On Mon, Feb 25, 2008 at 4:59 PM, Garrick Staples <garrick at usc.edu> wrote:
> > On Mon, Feb 25, 2008 at 02:42:33PM +0000, Adil Mughal alleged:
> >
> > > $usecp dphpc1011.dph.aber.ac.uk:/users/guest1 /users/guest1
> > >
> >
> > > Unable to copy file /var/spool/torque/spool/168.dphpc10.OU to
> > > guest1 at dphpc1011.dph.aber.ac.uk:/data01/guest1/STDIN.o168
> >
> > It's not using your $usecp line because the destination paths don't match.
> >
> > --
> > Garrick Staples, GNU/Linux HPCC SysAdmin
> > University of Southern California
> >
> > Please avoid sending me Word or PowerPoint attachments.
> > See http://www.gnu.org/philosophy/no-word-attachments.html
> >
> > _______________________________________________
> > torqueusers mailing list
> > torqueusers at supercluster.org
> > http://www.supercluster.org/mailman/listinfo/torqueusers
> >
> >
>
More information about the torqueusers
mailing list