[torqueusers] Parallel processing for MC code
RB. Ezhilalan (Principal Physicist, CUH)
RB.Ezhilalan at hse.ie
Fri Nov 18 02:48:10 MST 2011
Hi Jason,
PC1 (linux-01) is a single core PC like PC2, I defined the
server_priv/nodes file as;
Linux-01
Linux-02
As you have mentioned may be resource requirement needs to be properly
set up. Do you have any suggestions?
Many thanks,
Ezhilalan
-----Original Message-----
From: torqueusers-bounces at supercluster.org
[mailto:torqueusers-bounces at supercluster.org] On Behalf Of
torqueusers-request at supercluster.org
Sent: 17 November 2011 17:20
To: torqueusers at supercluster.org
Subject: torqueusers Digest, Vol 88, Issue 14
Send torqueusers mailing list submissions to
torqueusers at supercluster.org
To subscribe or unsubscribe via the World Wide Web, visit
http://www.supercluster.org/mailman/listinfo/torqueusers
or, via email, send a message with subject or body 'help' to
torqueusers-request at supercluster.org
You can reach the person managing the list at
torqueusers-owner at supercluster.org
When replying, please edit your Subject line so it is more specific
than "Re: Contents of torqueusers digest..."
Today's Topics:
1. Re: Random SCP errors when transfering to/from CREAM sandbox
(Christopher Samuel)
2. Re: Random SCP errors when transfering to/from CREAM sandbox
(Gila Arrondo Miguel Angel)
3. Parallel processing for MC code
(RB. Ezhilalan (Principal Physicist, CUH))
4. Re: Parallel processing for MC code (Jason Bacon)
5. Re: File staging syntax (Steve Traylen)
----------------------------------------------------------------------
Message: 1
Date: Thu, 17 Nov 2011 13:29:44 +1100
From: Christopher Samuel <samuel at unimelb.edu.au>
Subject: Re: [torqueusers] Random SCP errors when transfering to/from
CREAM sandbox
To: torqueusers at supercluster.org
Message-ID: <4EC47198.1040709 at unimelb.edu.au>
Content-Type: text/plain; charset=ISO-8859-1
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 17/11/11 03:24, Gila Arrondo Miguel Angel wrote:
> Many thanks for your answer. We've made sure that the
> keys are okay, as well as disabling hoskeychecking to
> test it.
Can you try and scp as that user to see whether it
complains about anything else ?
It may be that it is prompting the user to accept a
host key if they don't already have it.
cheers,
Chris
- --
Christopher Samuel - Senior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.unimelb.edu.au/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEARECAAYFAk7EcZgACgkQO2KABBYQAh9K+ACfeFLepTpowIXW9CiK2ECr1IdW
sgcAn0cIHr3JnJORTY4g2a/PcA/11fNS
=VPqK
-----END PGP SIGNATURE-----
------------------------------
Message: 2
Date: Thu, 17 Nov 2011 07:55:50 +0000
From: "Gila Arrondo Miguel Angel" <miguel.gila at cscs.ch>
Subject: Re: [torqueusers] Random SCP errors when transfering to/from
CREAM sandbox
To: Torque Users Mailing List <torqueusers at supercluster.org>
Message-ID: <36DEB2B3-4C2B-4B95-8CE6-DFB1363A71EE at cscs.ch>
Content-Type: text/plain; charset="us-ascii"
Hi Chris,
I've done that in many WNs and with different users, so I don't think
that is be the issue. I've also checked for scheduled tasks that
interact with the ssh keys, but the errors happen at random times, not
when the scheduled tasks run... :-S
I'm running out of options here.
Cheers,
Miguel
On Nov 17, 2011, at 3:29 AM, Christopher Samuel wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 17/11/11 03:24, Gila Arrondo Miguel Angel wrote:
>
>> Many thanks for your answer. We've made sure that the
>> keys are okay, as well as disabling hoskeychecking to
>> test it.
>
> Can you try and scp as that user to see whether it
> complains about anything else ?
>
> It may be that it is prompting the user to accept a
> host key if they don't already have it.
>
> cheers,
> Chris
> - --
> Christopher Samuel - Senior Systems Administrator
> VLSCI - Victorian Life Sciences Computation Initiative
> Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545
> http://www.vlsci.unimelb.edu.au/
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.11 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
>
> iEYEARECAAYFAk7EcZgACgkQO2KABBYQAh9K+ACfeFLepTpowIXW9CiK2ECr1IdW
> sgcAn0cIHr3JnJORTY4g2a/PcA/11fNS
> =VPqK
> -----END PGP SIGNATURE-----
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
--
Miguel Gila
CSCS Swiss National Supercomputing Centre
HPC Solutions
Via Cantonale, Galleria 2 | CH-6928 Manno | Switzerland
miguel.gila at cscs.ch | www.cscs.ch | Phone +41 91 610 82 22
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3239 bytes
Desc: not available
Url :
http://www.supercluster.org/pipermail/torqueusers/attachments/20111117/2
14ea9d6/attachment-0001.bin
------------------------------
Message: 3
Date: Thu, 17 Nov 2011 10:14:32 -0000
From: "RB. Ezhilalan (Principal Physicist, CUH)" <RB.Ezhilalan at hse.ie>
Subject: [torqueusers] Parallel processing for MC code
To: torqueusers at supercluster.org
Message-ID:
<DB0960F9D7310D4BA87B4985061511A703B30D96 at CKVEX004.south.health.local>
Content-Type: text/plain; charset="us-ascii"
Hi All,
I've been trying to set up Torque queuing system on two SUSE10.1 linux
PCs (PIII!).
Installed the linux on both PCs, exported home directory containing
BEAMnrc montecarlo code from PC1 to PC2 via NFS and set up SSH password
less communication. All seems to be working fine.
Downloaded latest version of Torque (number not handy) installed
PBS_SERVER, PBS_MOM & PBS_SCHED on PC1 and PBS_MOM on PC2.
PBS 'nodes' file was created as per guidelines, PBS_SERVER and QUEUE
attributes were set as default.
Pbsnodes -a command displays- two nodes (PC1 & PC2 and they are free. I
am not sure whether this confirms PBS/Torque set up correctly.
I was able to run an executable BEAMnrc user code in batch mode i.e
using 'exb' command aliased to 'qsub' and sources a built in job script
file with option p=1 (single job).
To split the jobs in to two, so that it runs in parallel on the two PCs,
option p=2 should be issued. However, what I noticed was, the job ran
twice on the first PC (PC1) but not on both.
I can't figure out what went wrong, I suspect PBS setup could have some
issues, May be I can try running the job specifically on PC2 if so what
command I need to give?
I would be grateful for any advice!
Kind Regards,
Ezhilalan
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://www.supercluster.org/pipermail/torqueusers/attachments/20111117/0
6e4a798/attachment-0001.html
------------------------------
Message: 4
Date: Thu, 17 Nov 2011 10:18:18 -0600
From: Jason Bacon <jwbacon at tds.net>
Subject: Re: [torqueusers] Parallel processing for MC code
To: Torque Users Mailing List <torqueusers at supercluster.org>
Message-ID: <4EC533CA.2000902 at tds.net>
Content-Type: text/plain; charset=windows-1252; format=flowed
How many cores does PC1 have? Note that Torque schedules cores, not
computers, unless you specifically tell it to with resource
requirements.
Regards,
-J
On 11/17/11 04:14, RB. Ezhilalan (Principal Physicist, CUH) wrote:
>
> Hi All,
>
> I?ve been trying to set up Torque queuing system on two SUSE10.1 linux
> PCs (PIII!).
>
> Installed the linux on both PCs, exported home directory containing
> BEAMnrc montecarlo code from PC1 to PC2 via NFS and set up SSH
> password less communication. All seems to be working fine.
>
> Downloaded latest version of Torque (number not handy) installed
> PBS_SERVER, PBS_MOM & PBS_SCHED on PC1 and PBS_MOM on PC2.
>
> PBS ?nodes? file was created as per guidelines, PBS_SERVER and QUEUE
> attributes were set as default.
>
> Pbsnodes ?a command displays- two nodes (PC1 & PC2 and they are free.
> I am not sure whether this confirms PBS/Torque set up correctly.
>
> I was able to run an executable BEAMnrc user code in batch mode i.e
> using ?exb? command aliased to ?qsub? and sources a built in job
> script file with option p=1 (single job).
>
> To split the jobs in to two, so that it runs in parallel on the two
> PCs, option p=2 should be issued. However, what I noticed was, the job
> ran twice on the first PC (PC1) but not on both.
>
> I can?t figure out what went wrong, I suspect PBS setup could have
> some issues, May be I can try running the job specifically on PC2 if
> so what command I need to give?
>
> I would be grateful for any advice!
>
> Kind Regards,
>
> Ezhilalan
>
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Jason W. Bacon
jwbacon at tds.net
http://personalpages.tds.net/~jwbacon
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
------------------------------
Message: 5
Date: Thu, 17 Nov 2011 18:19:14 +0100
From: Steve Traylen <steve.traylen at cern.ch>
Subject: Re: [torqueusers] File staging syntax
To: Torque Users Mailing List <torqueusers at supercluster.org>
Message-ID:
<CAOXEVSCY2CC-=ajKvcc6PAgKd5S6fupRkgNp79KL_w3=k2Xy1A at mail.gmail.com>
Keywords: CERN SpamKiller Note: -50
Content-Type: text/plain; charset="ISO-8859-1"
On Thu, Sep 29, 2011 at 4:59 PM, Ken Nielson
<knielson at adaptivecomputing.com> wrote>
> Andr?,
>
> I have not yet had time to reproduce this. I did look through the
change log and there are two suspects. One is in 2.5.6, a fix for
Bugzilla 115 and the other is in 2.5.8, a fix for Bugzilla 133.
>
> That is as far as I am right now. I will try to get to this as soon as
I can.
Hi Ken,
Did you manage to track this down. It's currently making upgrading a
pain.
Steve.
--
Steve Traylen
------------------------------
_______________________________________________
torqueusers mailing list
torqueusers at supercluster.org
http://www.supercluster.org/mailman/listinfo/torqueusers
End of torqueusers Digest, Vol 88, Issue 14
*******************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20111118/92cb300a/attachment-0001.html
More information about the torqueusers
mailing list