[torqueusers] Re: torqueusers Digest, Vol 36, Issue 23 (Ouf of Office Response)

Jonathan Ryskamp jryskamp at clusterresources.com
Sat Jul 21 11:57:46 MDT 2007


I will be out of the office on July 23rd and July 24th.

If you need immediate assistance please contact:

Technical Support: 
Nick Ihli
+1 (801) 717-3736
nick.ihli at clusterresources.com

Sales Support:
Michael Jackson
+1 (801) 717-3722
michael at clusterresources.com
And
Jess Arrington
+1 (801) 717-3716
jess at clusterresources.com

Thanks,
Jonathan

>>> torqueusers 07/21/07 12:00 >>>

Send torqueusers mailing list submissions to
	torqueusers at supercluster.org

To subscribe or unsubscribe via the World Wide Web, visit
	http://www.supercluster.org/mailman/listinfo/torqueusers
or, via email, send a message with subject or body 'help' to
	torqueusers-request at supercluster.org

You can reach the person managing the list at
	torqueusers-owner at supercluster.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of torqueusers digest..."


Today's Topics:

   1. inital torque setup - jobs are dieing right away
      (Adams, Samuel D Contr AFRL/HEDR)
   2. GSSAPI branch qstat behavior (Adam Steenwyk)
   3. Re: qsub problems (Garrick Staples)
   4. Re: inital torque setup - jobs are dieing right away
      (Garrick Staples)
   5. qsub -I : termcap or window sizes (David Corredor)
   6. Re: qsub -I : termcap or window sizes (Garrick Staples)
   7. Re: inital torque setup - jobs are dieing right away
      (James A. Peltier)


----------------------------------------------------------------------

Message: 1
Date: Fri, 20 Jul 2007 15:11:54 -0500
From: "Adams, Samuel D Contr AFRL/HEDR" <Samuel.Adams at BROOKS.AF.MIL>
Subject: [torqueusers] inital torque setup - jobs are dieing right
	away
To: <torqueusers at supercluster.org>
Message-ID:
	<8BF06A36E7AD424197195998D9A0B8E1D42035 at FBRMLBR01.Enterprise.afmc.ds.af.mil>
	
Content-Type: text/plain;	charset="us-ascii"

I am trying to get my initial torque-maui setup working, but whenever I
submit a job, they die right away.  This is what it mails me. 

----------------------------------------------------------
Message 1:
>From adm at prodnode1.brooks.af.mil  Fri Jul 20 15:05:18 2007
Date: Fri, 20 Jul 2007 15:05:18 -0500
From: adm <adm at prodnode1.brooks.af.mil>
To: sam at prodnode1.brooks.af.mil
Subject: PBS JOB 15.prodnode1.brooks.af.mil
Precedence: bulk

PBS Job Id: 15.prodnode1.brooks.af.mil
Job Name:   script.sh
Aborted by PBS Server 
Job cannot be executed
See Administrator for help

& 
Message 2:
>From adm at prodnode1.brooks.af.mil  Fri Jul 20 15:05:26 2007
Date: Fri, 20 Jul 2007 15:05:26 -0500
From: adm <adm at prodnode1.brooks.af.mil>
To: sam at prodnode1.brooks.af.mil
Subject: PBS JOB 15.prodnode1.brooks.af.mil
Precedence: bulk

PBS Job Id: 15.prodnode1.brooks.af.mil
Job Name:   script.sh
An error has occurred processing your job, see below.
Post job file processing error; job 15.prodnode1.brooks.af.mil on host
prodnode3.brooks.af.mil/0

Unable to copy file /var/spool/torque/spool/15.prodnode.OU to
/home/sam/code/fdtd/fdtd_0.3/all/script.sh.o15

Unable to copy file /var/spool/torque/spool/15.prodnode.ER to
/home/sam/code/fdtd/fdtd_0.3/all/script.sh.e15

&
--------------------------------------------------------

It sounds like it is not able to write the files it wants. Perhaps I
have the file stuff misconfigured for pbs_mom on the nodes.  This is all
I have in the config file.  /home is exported to all of the node via nfs

---------------------------------------------------------
$pbsserver prodnode1.brooks.af.mil
$usecp *.brooks.af.mil:/home /home
---------------------------------------------------------

Sam Adams
General Dynamics Information Technology
Phone: 210.536.5945



------------------------------

Message: 2
Date: Fri, 20 Jul 2007 14:47:10 -0400
From: "Adam Steenwyk" <ajamess at umich.edu>
Subject: [torqueusers] GSSAPI branch qstat behavior
To: torqueusers at supercluster.org
Cc: cac-systems at umich.edu
Message-ID:
	<58c6ed410707201147jff3e262m1b3d61da8b97d97c at mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

Hello,

In testing the GSSAPI branch (1473) on a single head node, single
client,
system I have noticed what I consider strange behavior.  What happens is
as
follows:

I submit a long job to the queue (100 hours) after having gotten
forwardable
and renewable tickets:

[14:39:39 ajamess at ragnar ajamess]$ kinit -f -r 14d ajamess at UMICH.EDU
Password for ajamess at UMICH.EDU:
[14:39:50 ajamess at ragnar ajamess]$ aklog
[14:39:56 ajamess at ragnar ajamess]$ klist -f
Ticket cache: FILE:/tmp/krb5cc_183752_SE33Tu
Default principal: ajamess at UMICH.EDU

Valid starting     Expires            Service principal
07/20/07 14:39:50  07/21/07 14:39:48  krbtgt/UMICH.EDU at UMICH.EDU
        renew until 07/27/07 14:39:50, Flags: FRIA
07/20/07 14:39:53  07/21/07 14:39:48  afs at UMICH.EDU
        renew until 07/27/07 14:39:50, Flags: FRAT

Kerberos 4 ticket cache: /tmp/tkt183752
klist: You have no tickets cached

[14:41:01 ajamess at ragnar ajamess]$ qsub torque-afs-test-weekend.sh
313.ragnar.engin.umich.edu

This is all fine and good.  The ticket gets passed in with my job and
can be
renewed correctly throughout the job's lifetime; however, if I stay
logged
in without renewing for longer than the non renewable lifetime of the
ticket, qstat fails to work.

[09:29:49 ajamess at ragnar ~]$ qstat -au ajamess
pbsgss_client_establish_context.gss_init_set_context : Miscellaneous
failure
pbsgss_client_establish_context.gss_init_set_context : Ticket expired
pbsgss_client_establish_context.gss_init_set_context : Miscellaneous
failure
pbsgss_client_establish_context.gss_init_set_context : Ticket expired
pbsgss_client_establish_context.gss_init_set_context : Miscellaneous
failure
pbsgss_client_establish_context.gss_init_set_context : Ticket expired
pbsgss_client_establish_context.gss_init_set_context : Miscellaneous
failure
pbsgss_client_establish_context.gss_init_set_context : Ticket expired

Now, if I destroy my tickets and do qstat again *without afs
credentials*
there is no problem.  If I get *new* tickets, there is also not a
problem
*until* the tickets expire, which is when I see this behavior.

My question is: is such behavior by qstat to be expected?  Am I missing
something?

Thank you!

Adam.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://www.supercluster.org/pipermail/torqueusers/attachments/20070720/a1c63475/attachment-0001.html

------------------------------

Message: 3
Date: Fri, 20 Jul 2007 15:32:43 -0700
From: Garrick Staples <garrick at usc.edu>
Subject: Re: [torqueusers] qsub problems
To: torqueusers at supercluster.org
Message-ID: <20070720223243.GB6871 at polop.usc.edu>
Content-Type: text/plain; charset="us-ascii"

On Thu, Jul 19, 2007 at 06:56:46PM -0600, amjad syed alleged:
> hello
> i am using torque 2.1.8.
> I have Apple G5(ppc64) as my headnode and IBM p5185(ppc64) as my
client
> 
> I have setup  according to doc's .
> 
> I have started pbs_server and pbs_Sched on server ad pbs_mom on client
and
> can ping to each other
> 
> pbsnodes -a  returns the client
> 
> I am having problem while submitting the jobs using qsub
> 
> this is error i am getting when i submit as a user
> 
> qsub:could not create  copy of script /tmp/qsub.Omkgty
> 
> I have nothing in /tmp directory

Is /tmp full or have wrong permissions?  What does 'ls -ld /tmp' and 'df
/tmp' print?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url :
http://www.supercluster.org/pipermail/torqueusers/attachments/20070720/7ce2ba5e/attachment-0001.bin

------------------------------

Message: 4
Date: Fri, 20 Jul 2007 15:45:31 -0700
From: Garrick Staples <garrick at usc.edu>
Subject: Re: [torqueusers] inital torque setup - jobs are dieing right
	away
To: torqueusers at supercluster.org
Message-ID: <20070720224531.GD6871 at polop.usc.edu>
Content-Type: text/plain; charset="us-ascii"

On Fri, Jul 20, 2007 at 03:11:54PM -0500, Adams, Samuel D Contr
AFRL/HEDR alleged:
> I am trying to get my initial torque-maui setup working, but whenever
I
> submit a job, they die right away.  This is what it mails me. 

Check syslog on prodnode3?

 
> Unable to copy file /var/spool/torque/spool/15.prodnode.OU to
> /home/sam/code/fdtd/fdtd_0.3/all/script.sh.o15
> 
> Unable to copy file /var/spool/torque/spool/15.prodnode.ER to
> /home/sam/code/fdtd/fdtd_0.3/all/script.sh.e15

Permissions on spool correct?

'ls -ld /var/spool/torque/spool'

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url :
http://www.supercluster.org/pipermail/torqueusers/attachments/20070720/c3b141a4/attachment-0001.bin

------------------------------

Message: 5
Date: Fri, 20 Jul 2007 17:59:21 -0500
From: David Corredor <tecnico at nsstc.uah.edu>
Subject: [torqueusers] qsub -I : termcap or window sizes
To: torqueusers at supercluster.org
Message-ID: <46A13E49.5060605 at nsstc.uah.edu>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

torque v.2.1.8
CentOS5 32bit

   I'm trying to use "VIM", "less" and other ncurses based tools under 
an interactive job shell but it seems that my local window's properties 
are not passed along to the remote node where I'm running interactively.

More in particular, if I resize the my window after launching my job, 
the window size (lines & columns) are not updated remotely.

   This only happens when running through "qsub -I" and I've tried "qsub

-V -I" too. And it happens whether I'm remotely connecting on Linux or 
Windows (xterm,konsole,gnome console,putty). My /etc/termcap files are 
all the same through all nodes. The remote TERM is xterm by default, 
I've tried vt100 and screen with no luck.

   Any ideas?


   David





------------------------------

Message: 6
Date: Fri, 20 Jul 2007 16:04:13 -0700
From: Garrick Staples <garrick at usc.edu>
Subject: Re: [torqueusers] qsub -I : termcap or window sizes
To: torqueusers at supercluster.org
Message-ID: <20070720230413.GF6871 at polop.usc.edu>
Content-Type: text/plain; charset="us-ascii"

On Fri, Jul 20, 2007 at 05:59:21PM -0500, David Corredor alleged:
> torque v.2.1.8
> CentOS5 32bit
> 
>   I'm trying to use "VIM", "less" and other ncurses based tools under 
> an interactive job shell but it seems that my local window's
properties 
> are not passed along to the remote node where I'm running
interactively. 
> More in particular, if I resize the my window after launching my job, 
> the window size (lines & columns) are not updated remotely.
> 
>   This only happens when running through "qsub -I" and I've tried
"qsub 
> -V -I" too. And it happens whether I'm remotely connecting on Linux or

> Windows (xterm,konsole,gnome console,putty). My /etc/termcap files are

> all the same through all nodes. The remote TERM is xterm by default, 
> I've tried vt100 and screen with no luck.

qsub doesn't handle resizes.

Your email isn't quite clear if everything is working correctly before
the resize.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url :
http://www.supercluster.org/pipermail/torqueusers/attachments/20070720/94c66ed0/attachment-0001.bin

------------------------------

Message: 7
Date: Fri, 20 Jul 2007 14:44:11 -0700
From: "James A. Peltier" <jpeltier at cs.sfu.ca>
Subject: Re: [torqueusers] inital torque setup - jobs are dieing right
	away
To: "Adams, Samuel D Contr AFRL/HEDR" <Samuel.Adams at BROOKS.AF.MIL>
Cc: torqueusers at supercluster.org
Message-ID: <46A12CAB.2050703 at cs.sfu.ca>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Adams, Samuel D Contr AFRL/HEDR wrote:
> I am trying to get my initial torque-maui setup working, but whenever
I
> submit a job, they die right away.  This is what it mails me. 
> 
> ----------------------------------------------------------
> Message 1:
>>From adm at prodnode1.brooks.af.mil  Fri Jul 20 15:05:18 2007
> Date: Fri, 20 Jul 2007 15:05:18 -0500
> From: adm <adm at prodnode1.brooks.af.mil>
> To: sam at prodnode1.brooks.af.mil
> Subject: PBS JOB 15.prodnode1.brooks.af.mil
> Precedence: bulk
> 
> PBS Job Id: 15.prodnode1.brooks.af.mil
> Job Name:   script.sh
> Aborted by PBS Server 
> Job cannot be executed
> See Administrator for help
> 
> & 
> Message 2:
>>From adm at prodnode1.brooks.af.mil  Fri Jul 20 15:05:26 2007
> Date: Fri, 20 Jul 2007 15:05:26 -0500
> From: adm <adm at prodnode1.brooks.af.mil>
> To: sam at prodnode1.brooks.af.mil
> Subject: PBS JOB 15.prodnode1.brooks.af.mil
> Precedence: bulk
> 
> PBS Job Id: 15.prodnode1.brooks.af.mil
> Job Name:   script.sh
> An error has occurred processing your job, see below.
> Post job file processing error; job 15.prodnode1.brooks.af.mil on host
> prodnode3.brooks.af.mil/0
> 
> Unable to copy file /var/spool/torque/spool/15.prodnode.OU to
> /home/sam/code/fdtd/fdtd_0.3/all/script.sh.o15
> 
> Unable to copy file /var/spool/torque/spool/15.prodnode.ER to
> /home/sam/code/fdtd/fdtd_0.3/all/script.sh.e15
> 
> &
> --------------------------------------------------------
> 
> It sounds like it is not able to write the files it wants. Perhaps I
> have the file stuff misconfigured for pbs_mom on the nodes.  This is
all
> I have in the config file.  /home is exported to all of the node via
nfs
> 
> ---------------------------------------------------------
> $pbsserver prodnode1.brooks.af.mil
> $usecp *.brooks.af.mil:/home /home
> ---------------------------------------------------------
> 
> Sam Adams
> General Dynamics Information Technology
> Phone: 210.536.5945
> 
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
> 

try $usecp *:/ /

-- 
James A. Peltier
Technical Director, RHCE
SCIRF | GrUVi @ Simon Fraser University - Burnaby Campus
Phone   : 604-291-3610
Fax     : 604-291-3045
Mobile  : 778-840-6434
E-Mail  : jpeltier at cs.sfu.ca
Website : http://gruvi.cs.sfu.ca | http://scirf.cs.sfu.ca
MSN     : subatomic_spam at hotmail.com


------------------------------

_______________________________________________
torqueusers mailing list
torqueusers at supercluster.org
http://www.supercluster.org/mailman/listinfo/torqueusers


End of torqueusers Digest, Vol 36, Issue 23
*******************************************


More information about the torqueusers mailing list