[torqueusers] Parallel processing for MC code

RB. Ezhilalan (Principal Physicist, CUH) RB.Ezhilalan at hse.ie
Fri Nov 18 02:48:10 MST 2011


Hi Jason,

 

PC1 (linux-01) is a single core PC like PC2, I defined the
server_priv/nodes file as; 

Linux-01 

Linux-02 

 

As you have mentioned may be resource requirement needs to be properly
set up. Do you have any suggestions?

 

Many thanks,

 

Ezhilalan

 

-----Original Message-----
From: torqueusers-bounces at supercluster.org
[mailto:torqueusers-bounces at supercluster.org] On Behalf Of
torqueusers-request at supercluster.org
Sent: 17 November 2011 17:20
To: torqueusers at supercluster.org
Subject: torqueusers Digest, Vol 88, Issue 14

 

Send torqueusers mailing list submissions to

      torqueusers at supercluster.org

 

To subscribe or unsubscribe via the World Wide Web, visit

      http://www.supercluster.org/mailman/listinfo/torqueusers

or, via email, send a message with subject or body 'help' to

      torqueusers-request at supercluster.org

 

You can reach the person managing the list at

      torqueusers-owner at supercluster.org

 

When replying, please edit your Subject line so it is more specific

than "Re: Contents of torqueusers digest..."

 

 

Today's Topics:

 

   1. Re: Random SCP errors when transfering to/from  CREAM sandbox

      (Christopher Samuel)

   2. Re: Random SCP errors when transfering to/from  CREAM sandbox

      (Gila Arrondo  Miguel Angel)

   3. Parallel processing for MC code

      (RB. Ezhilalan (Principal Physicist, CUH))

   4. Re: Parallel processing for MC code (Jason Bacon)

   5. Re: File staging syntax (Steve Traylen)

 

 

----------------------------------------------------------------------

 

Message: 1

Date: Thu, 17 Nov 2011 13:29:44 +1100

From: Christopher Samuel <samuel at unimelb.edu.au>

Subject: Re: [torqueusers] Random SCP errors when transfering to/from

      CREAM sandbox

To: torqueusers at supercluster.org

Message-ID: <4EC47198.1040709 at unimelb.edu.au>

Content-Type: text/plain; charset=ISO-8859-1

 

-----BEGIN PGP SIGNED MESSAGE-----

Hash: SHA1

 

On 17/11/11 03:24, Gila Arrondo Miguel Angel wrote:

 

> Many thanks for your answer. We've made sure that the

> keys are okay, as well as disabling hoskeychecking to

> test it. 

 

Can you try and scp as that user to see whether it

complains about anything else ?

 

It may be that it is prompting the user to accept a

host key if they don't already have it.

 

cheers,

Chris

- -- 

    Christopher Samuel - Senior Systems Administrator

 VLSCI - Victorian Life Sciences Computation Initiative

 Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545

         http://www.vlsci.unimelb.edu.au/

 

-----BEGIN PGP SIGNATURE-----

Version: GnuPG v1.4.11 (GNU/Linux)

Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

 

iEYEARECAAYFAk7EcZgACgkQO2KABBYQAh9K+ACfeFLepTpowIXW9CiK2ECr1IdW

sgcAn0cIHr3JnJORTY4g2a/PcA/11fNS

=VPqK

-----END PGP SIGNATURE-----

 

 

------------------------------

 

Message: 2

Date: Thu, 17 Nov 2011 07:55:50 +0000

From: "Gila Arrondo  Miguel Angel" <miguel.gila at cscs.ch>

Subject: Re: [torqueusers] Random SCP errors when transfering to/from

      CREAM sandbox

To: Torque Users Mailing List <torqueusers at supercluster.org>

Message-ID: <36DEB2B3-4C2B-4B95-8CE6-DFB1363A71EE at cscs.ch>

Content-Type: text/plain; charset="us-ascii"

 

Hi Chris,

 

I've done that in many WNs and with different users, so I don't think
that is be the issue. I've also checked for scheduled tasks that
interact with the ssh keys, but the errors happen at random times, not
when the scheduled tasks run... :-S

 

I'm running out of options here.

 

Cheers,

Miguel

 

On Nov 17, 2011, at 3:29 AM, Christopher Samuel wrote:

 

> -----BEGIN PGP SIGNED MESSAGE-----

> Hash: SHA1

> 

> On 17/11/11 03:24, Gila Arrondo Miguel Angel wrote:

> 

>> Many thanks for your answer. We've made sure that the

>> keys are okay, as well as disabling hoskeychecking to

>> test it. 

> 

> Can you try and scp as that user to see whether it

> complains about anything else ?

> 

> It may be that it is prompting the user to accept a

> host key if they don't already have it.

> 

> cheers,

> Chris

> - -- 

>    Christopher Samuel - Senior Systems Administrator

> VLSCI - Victorian Life Sciences Computation Initiative

> Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545

>         http://www.vlsci.unimelb.edu.au/

> 

> -----BEGIN PGP SIGNATURE-----

> Version: GnuPG v1.4.11 (GNU/Linux)

> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

> 

> iEYEARECAAYFAk7EcZgACgkQO2KABBYQAh9K+ACfeFLepTpowIXW9CiK2ECr1IdW

> sgcAn0cIHr3JnJORTY4g2a/PcA/11fNS

> =VPqK

> -----END PGP SIGNATURE-----

> _______________________________________________

> torqueusers mailing list

> torqueusers at supercluster.org

> http://www.supercluster.org/mailman/listinfo/torqueusers

 

--

Miguel Gila

CSCS Swiss National Supercomputing Centre 

HPC Solutions

Via Cantonale, Galleria 2 | CH-6928 Manno | Switzerland

miguel.gila at cscs.ch | www.cscs.ch | Phone +41 91 610 82 22

 

-------------- next part --------------

A non-text attachment was scrubbed...

Name: smime.p7s

Type: application/pkcs7-signature

Size: 3239 bytes

Desc: not available

Url :
http://www.supercluster.org/pipermail/torqueusers/attachments/20111117/2
14ea9d6/attachment-0001.bin 

 

------------------------------

 

Message: 3

Date: Thu, 17 Nov 2011 10:14:32 -0000

From: "RB. Ezhilalan (Principal Physicist, CUH)" <RB.Ezhilalan at hse.ie>

Subject: [torqueusers] Parallel processing for MC code

To: torqueusers at supercluster.org

Message-ID:

 
<DB0960F9D7310D4BA87B4985061511A703B30D96 at CKVEX004.south.health.local>

Content-Type: text/plain; charset="us-ascii"

 

 

 

Hi All,

 

 

 

I've been trying to set up Torque queuing system on two SUSE10.1 linux

PCs (PIII!).

 

 

 

 

 

Installed the linux on both PCs, exported home directory containing

BEAMnrc montecarlo code from PC1 to PC2 via NFS and set up SSH password

less communication. All seems to be working fine.

 

 

 

 

 

Downloaded latest version of Torque (number not handy) installed

PBS_SERVER, PBS_MOM & PBS_SCHED on PC1 and PBS_MOM on PC2.

 

 

 

PBS 'nodes' file was created as per guidelines, PBS_SERVER and QUEUE

attributes were set as default.

 

 

 

 

 

Pbsnodes -a command displays- two nodes (PC1 & PC2 and they are free. I

am not sure whether this confirms PBS/Torque set up correctly.

 

 

 

 

 

I was able to run an executable BEAMnrc user code in batch mode i.e

using 'exb' command aliased to 'qsub' and sources a built in job script

file with option p=1 (single job).

 

 

 

To split the jobs in to two, so that it runs in parallel on the two PCs,

option p=2 should be issued. However, what I noticed was, the job ran

twice on the first PC (PC1) but not on both.

 

 

 

I can't figure out what went wrong, I suspect PBS setup could have some

issues, May be I can try running the job specifically on PC2 if so what

command I need to give? 

 

 

 

I would be grateful for any advice!

 

 

 

Kind Regards,

 

Ezhilalan  

 

 

 

-------------- next part --------------

An HTML attachment was scrubbed...

URL:
http://www.supercluster.org/pipermail/torqueusers/attachments/20111117/0
6e4a798/attachment-0001.html 

 

------------------------------

 

Message: 4

Date: Thu, 17 Nov 2011 10:18:18 -0600

From: Jason Bacon <jwbacon at tds.net>

Subject: Re: [torqueusers] Parallel processing for MC code

To: Torque Users Mailing List <torqueusers at supercluster.org>

Message-ID: <4EC533CA.2000902 at tds.net>

Content-Type: text/plain; charset=windows-1252; format=flowed

 

 

How many cores does PC1 have? Note that Torque schedules cores, not 

computers, unless you specifically tell it to with resource
requirements.

 

Regards,

 

-J

 

On 11/17/11 04:14, RB. Ezhilalan (Principal Physicist, CUH) wrote:

> 

> Hi All,

> 

> I?ve been trying to set up Torque queuing system on two SUSE10.1 linux


> PCs (PIII!).

> 

> Installed the linux on both PCs, exported home directory containing 

> BEAMnrc montecarlo code from PC1 to PC2 via NFS and set up SSH 

> password less communication. All seems to be working fine.

> 

> Downloaded latest version of Torque (number not handy) installed 

> PBS_SERVER, PBS_MOM & PBS_SCHED on PC1 and PBS_MOM on PC2.

> 

> PBS ?nodes? file was created as per guidelines, PBS_SERVER and QUEUE 

> attributes were set as default.

> 

> Pbsnodes ?a command displays- two nodes (PC1 & PC2 and they are free. 

> I am not sure whether this confirms PBS/Torque set up correctly.

> 

> I was able to run an executable BEAMnrc user code in batch mode i.e 

> using ?exb? command aliased to ?qsub? and sources a built in job 

> script file with option p=1 (single job).

> 

> To split the jobs in to two, so that it runs in parallel on the two 

> PCs, option p=2 should be issued. However, what I noticed was, the job


> ran twice on the first PC (PC1) but not on both.

> 

> I can?t figure out what went wrong, I suspect PBS setup could have 

> some issues, May be I can try running the job specifically on PC2 if 

> so what command I need to give?

> 

> I would be grateful for any advice!

> 

> Kind Regards,

> 

> Ezhilalan

> 

> 

> _______________________________________________

> torqueusers mailing list

> torqueusers at supercluster.org

> http://www.supercluster.org/mailman/listinfo/torqueusers

 

 

-- 

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Jason W. Bacon

jwbacon at tds.net

http://personalpages.tds.net/~jwbacon

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 

 

 

 

------------------------------

 

Message: 5

Date: Thu, 17 Nov 2011 18:19:14 +0100

From: Steve Traylen <steve.traylen at cern.ch>

Subject: Re: [torqueusers] File staging syntax

To: Torque Users Mailing List <torqueusers at supercluster.org>

Message-ID:

 
<CAOXEVSCY2CC-=ajKvcc6PAgKd5S6fupRkgNp79KL_w3=k2Xy1A at mail.gmail.com>

Keywords: CERN SpamKiller Note: -50

Content-Type: text/plain; charset="ISO-8859-1"

 

On Thu, Sep 29, 2011 at 4:59 PM, Ken Nielson

<knielson at adaptivecomputing.com> wrote>

> Andr?,

> 

> I have not yet had time to reproduce this. I did look through the
change log and there are two suspects. One is in 2.5.6, a fix for
Bugzilla 115 and the other is in 2.5.8, a fix for Bugzilla 133.

> 

> That is as far as I am right now. I will try to get to this as soon as
I can.

 

Hi Ken,

 

 Did you manage to track this down. It's currently making upgrading a
pain.

 

Steve.

 

 

-- 

Steve Traylen

 

 

------------------------------

 

_______________________________________________

torqueusers mailing list

torqueusers at supercluster.org

http://www.supercluster.org/mailman/listinfo/torqueusers

 

 

End of torqueusers Digest, Vol 88, Issue 14

*******************************************

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20111118/92cb300a/attachment-0001.html 


More information about the torqueusers mailing list