[torqueusers] Torque 2.5.0 beta on Cygwin

Felix Wolfheimer Felix.Wolfheimer at cst.com
Fri Jul 9 06:27:16 MDT 2010


Dear Garrick,

thanks for your reply. I hope the Cygwin expert can help me with the issue.

I've attached the updated README.Cygwin to this message. I've also corrected some typos in the document. Hope this is of help for you.

BTW: I've checked the permissions in my Cygwin installation: An ls -l on /bin/bash gives me

-rwxr-xr-x  1 Administrator Administrators 472064 2009-07-02 03:20 /bin/bash

which means obviously that each and every user should be able to execute /bin/bash. When I look at the file permissions with Windows Explorer it says also that "Everyone" has the permission to read and execute bash.exe. Seems that the permissions set in Cygwin and on the Windows side are consistent and allow execution of the shell.

Best regards

Felix



-----Ursprüngliche Nachricht-----
Von: torqueusers-request at supercluster.org [mailto:torqueusers-request at supercluster.org]
Gesendet: Freitag, 9. Juli 2010 13:36
An: torqueusers at supercluster.org
Betreff: torqueusers Digest, Vol 72, Issue 10

Send torqueusers mailing list submissions to
	torqueusers at supercluster.org

To subscribe or unsubscribe via the World Wide Web, visit
	http://www.supercluster.org/mailman/listinfo/torqueusers
or, via email, send a message with subject or body 'help' to
	torqueusers-request at supercluster.org

You can reach the person managing the list at
	torqueusers-owner at supercluster.org

When replying, please edit your Subject line so it is more specific than "Re: Contents of torqueusers digest..."


Today's Topics:

   1. Re: availmem (Garrick Staples)
   2. Re: a question about $usecp (Andreas Davour)
   3. Re: a question about $usecp (Garrick Staples)
   4. Re: Torque 2.5.0 beta on Cygwin (Garrick Staples)
   5. Re: problem getting files copied (Christopher Samuel)
   6. Re: pbs_server gssapi (Andreas Davour)
   7. how is the torque renewal scripts supposed to work?
      (Andreas Davour)


----------------------------------------------------------------------

Message: 1
Date: Thu, 8 Jul 2010 11:16:10 -0700
From: Garrick Staples <garrick at usc.edu>
Subject: Re: [torqueusers] availmem
To: "'torqueusers at supercluster.org'" <torqueusers at supercluster.org>
Message-ID: <20100708181610.GX21193 at polop.usc.edu>
Content-Type: text/plain; charset="us-ascii"

On Wed, Jul 07, 2010 at 04:40:34PM +0100, Rudge, Chris M. (Dr.) alleged:
> Similarly, looking at a node where there's a process which reports a virtual size of 22GB and resident size of 11GB, I see:
> availmem=11773424kb
> which is the remaining physical + swap rather than just remaining physical which should be close to zero.

Yes, availmem is MemFree+SwapFree.

You can see it in src/resmom/linux/mom_mach.c:availmem():

  sprintf(ret_string, "%lukb",
     (ulong)((mm->mem_free >> 10) + (mm->swap_free >> 10))); /* KB */

  return(ret_string);

-- Garrick Staples, GNU/Linux HPCC SysAdmin University of Southern California

Life is Good!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20100708/14937c71/attachment-0001.bin

------------------------------

Message: 2
Date: Thu, 08 Jul 2010 20:33:57 +0200
From: Andreas Davour <davour at pdc.kth.se>
Subject: Re: [torqueusers] a question about $usecp
To: torqueusers at supercluster.org
Message-ID: <201007082033.57570.davour at pdc.kth.se>
Content-Type: Text/Plain; charset=iso-8859-1

On Thursday 08 July 2010 19:48:47 Joshua Bernstein wrote:
> Hell Andreas,
> > $usecp needs to be enabled when you have your current working
> > directory
> (CWD) or really the directory you qsub'd from NFS (or AFS etc) mounted
> on the compute nodes. Otherwise the compute nodes attempt to SCP (or
> in builds without --with-scp, RCP) the .o and .e files.
> > For example, on basic clusters /home is NFS mounted on the compute
> > nodes
> as well as the submission nodes. So I need a line in mom_priv/config
> that tells pbs_mom to simply "use cp" rather then rcp/scp
> > $usercp	*:/home /home
> > If I then qsub a job out of /home/jbernstein, the output files get
> copied using 'cp'.
> > If your jobs aren't even starting, then *.o and *.e files aren't
> > being
> created and there is another problem afoot.

Excellent explanation, Josh! Now I think I grasp it. Thanks.

Yeah, I seem to have another problem as well, since no *.o or *.e files show up neither on the server nor the worker node.

/andreas
-- L. Andreas Davour
Systems Engineer
PDC/KTH
Stockholm, Sweden


------------------------------

Message: 3
Date: Thu, 8 Jul 2010 11:38:09 -0700
From: Garrick Staples <garrick at usc.edu>
Subject: Re: [torqueusers] a question about $usecp
To: torqueusers at supercluster.org
Message-ID: <20100708183809.GY21193 at polop.usc.edu>
Content-Type: text/plain; charset="us-ascii"

On Thu, Jul 08, 2010 at 04:41:04PM +0200, Andreas Davour alleged:
> Is it only for copying the final e# and o# files? My problem seem to
> end up

Stagein, stageout, stdout, and stderr files.

-- Garrick Staples, GNU/Linux HPCC SysAdmin University of Southern California

Life is Good!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20100708/b264887f/attachment-0001.bin

------------------------------

Message: 4
Date: Thu, 8 Jul 2010 13:50:59 -0700
From: Garrick Staples <garrick at usc.edu>
Subject: Re: [torqueusers] Torque 2.5.0 beta on Cygwin
To: torqueusers at supercluster.org
Message-ID: <20100708205059.GF21193 at polop.usc.edu>
Content-Type: text/plain; charset="us-ascii"

On Wed, Jul 07, 2010 at 04:09:24PM +0200, Felix Wolfheimer alleged:
> All Torque services start without any problem. However, as soon as I submit a job using my normal user account on the machine the job ends immediately and in the stdout and stderr files is only one line saying "shell "/bin/bash" is not executable by user "FelixWolfheimer"" (stdout) and "PBS: exec of shell '/bin/bash' failed" (stderr), which is very strange as I can use the bash shell as this user. I guess this has something to do with user permissions but after hours trying to figure out what is happening I'm quite clueless now.

The author of the cygwin support, Igor Ilyenko, is probably the only one that understands this stuff. Hopefully he chimes in.


 > I found out that you need to edit the /etc/passwd and /etc/group files manually to assign the Administrator account to the correct primary group ("Administrators", 544) as this is not done automatically. Maybe this could be included in the README.cygwin. It took me quite a while to find this out. Otherwise the account "Administrator" was not able to start the services (Message was: "Must be started by user with Administrator priviledges" or something similar).

Can you send an updated README.cygwin?

-- Garrick Staples, GNU/Linux HPCC SysAdmin University of Southern California

Life is Good!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20100708/a10a340f/attachment-0001.bin

------------------------------

Message: 5
Date: Fri, 09 Jul 2010 11:57:51 +1000
From: Christopher Samuel <samuel at unimelb.edu.au>
Subject: Re: [torqueusers] problem getting files copied
To: torqueusers at supercluster.org
Message-ID:
	<D45958078CD65C429557B4C5F492B6A6088CD78F at IS-EX-BEV3.unimelb.edu.au>
Content-Type: text/plain; charset="iso-8859-1"

On 08/07/10 23:05, Andreas Davour wrote:

> Looking in /var/spool/torque I find nothing looking like ER or OU
> files or any uncopied files in the undelivered directory.

Your job isn't getting far enough for that to happen would be my guess.

> On the only node inline the mom log say:
> 07/08/2010 14:52:09;0001;   pbs_mom;Job;TMomFinalizeJob3;start failed, > improper sid

In your syslog you will likely see something like:

"read of pipe for sid failed for job %s (%d of %d bytes)"

as that's the error right before the "improper sid" report.

The test that is failing is:

        if (ReadSize != sizeof(sjr))

where ReadSize is passed into the function TMomFinalizeJob3 and sjr is defined as:

        struct startjob_rtn sjr;

I don't know this area of code, so would need to dig a bit deeper, but I presume the problem is occuring before that to cause ReadSize to not match it.

cheers,
Chris
--  Christopher Samuel - Senior Systems Administrator  VLSCI - Victorian Life Sciences Computational Initiative
 Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545
         http://www.vlsci.unimelb.edu.au/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20100709/5ccdab32/attachment-0001.html

------------------------------

Message: 6
Date: Fri, 9 Jul 2010 10:02:33 +0200
From: Andreas Davour <davour at pdc.kth.se>
Subject: Re: [torqueusers] pbs_server gssapi
To: torqueusers at supercluster.org
Cc: Ekaterina Popova <Ekaterina.Popova at ihep.ru>
Message-ID: <201007091002.33326.davour at pdc.kth.se>
Content-Type: Text/Plain;  charset="iso-8859-1"

On Wednesday 07 July 2010 14:16:10 Ekaterina Popova wrote:
> Andreas Davour <davour <at> pdc.kth.se> writes:
> > On Tuesday 06 July 2010 09:16:12 Ekaterina Popova wrote:
> > > Hi,
> > > Could you help me, please?
> > > I installed torque-2.5.0 with gssapi-branch. I can make qsub -I
> > > and qrun manually. But then I turn on maui, pbs_server crashes.
> > > Here is gdb output (I have not got any errors in pbs_server log):
> > > Starting program: /opt/pbs_tcl/sbin/pbs_server [Thread debugging
> > > using libthread_db enabled]
> >
> > [snip]
> >
> > Have you tried it with 2.4.8 as well? I have been working with
> > getting that one to run (with heimdal) and would be very interested
> > in how it works with maui. I had hoped to get maui working tomorrow,
> > since I still had problems compiling it when I left the office today.
> >
> > /andreas
> > Hello.
> Many thanks for your answer.
> I haven't tried it with 2.4.8. I don't know how to make it. I used
> usual  way for getting torque gssapi branch:
> mkdir /opt/torque_svn
> cd /opt/torque_svn
> git svn init svn://svn.clusterresources.com/torque/branches/gssapi
> git svn fetch
> The result is torque-2.5.0 with gssapi support. How to get
> torque-2.4.8  with gssapi support? I'would be very pleased if you explained me.

git svn!? You seem to be using more version control systems on one line than I use in a week. Does that work? Amazing.

I downloaded the 2.4.8 tar ball and after that put the svn check out from last week in that unpacked directory. Maybe that will give me 2.5.0 now. *shrug*

I did not have maui working, btw. My jobs never starts...

/andreas
-- Systems Engineer
PDC Center for High Performance Computing CSC School of Computer Science and Communication KTH Royal Institute of Technology SE-100 44 Stockholm, Sweden
Phone: 087906658
"A satellite, an earring, and a dust bunny are what made America great!"


------------------------------

Message: 7
Date: Fri, 9 Jul 2010 13:37:43 +0200
From: Andreas Davour <davour at pdc.kth.se>
Subject: [torqueusers] how is the torque renewal scripts supposed to
	work?
To: torqueusers at supercluster.org
Message-ID: <201007091337.43089.davour at pdc.kth.se>
Content-Type: Text/Plain;  charset="us-ascii"


After the problems I posted yesterday I think it's clear that I have a very vague idea of hwo things are supposed to work.

So, how, when and where are the client and server renewal scripts distributed with the gssapi branch really supposed to be run?

/andreas
-- Systems Engineer
PDC Center for High Performance Computing CSC School of Computer Science and Communication KTH Royal Institute of Technology SE-100 44 Stockholm, Sweden
Phone: 087906658
"A satellite, an earring, and a dust bunny are what made America great!"


------------------------------

_______________________________________________
torqueusers mailing list
torqueusers at supercluster.org
http://www.supercluster.org/mailman/listinfo/torqueusers


End of torqueusers Digest, Vol 72, Issue 10
*******************************************
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/octet-stream
Size: 8132 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20100709/c224d8f0/attachment.obj 


More information about the torqueusers mailing list