[torqueusers] Parallel processing for MC code

Jason Bacon jwbacon at tds.net
Fri Nov 18 06:57:23 MST 2011


I was only wondering if you had "np=2" in the Linux-01 entry, or if 
Torque was configured to autodetect the number of cores and there were 
two.  That would have explained the scheduling behavior.

Regards,

     -J

On 11/18/11 03:48, RB. Ezhilalan (Principal Physicist, CUH) wrote:
>
> Hi Jason,
>
> PC1 (linux-01) is a single core PC like PC2, I defined the 
> server_priv/nodes file as;
>
> Linux-01
>
> Linux-02
>
> As you have mentioned may be resource requirement needs to be properly 
> set up. Do you have any suggestions?
>
> Many thanks,
>
> Ezhilalan
>
> -----Original Message-----
> From: torqueusers-bounces at supercluster.org 
> [mailto:torqueusers-bounces at supercluster.org] On Behalf Of 
> torqueusers-request at supercluster.org
> Sent: 17 November 2011 17:20
> To: torqueusers at supercluster.org
> Subject: torqueusers Digest, Vol 88, Issue 14
>
> Send torqueusers mailing list submissions to
>
> torqueusers at supercluster.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>
>       http://www.supercluster.org/mailman/listinfo/torqueusers
>
> or, via email, send a message with subject or body 'help' to
>
>       torqueusers-request at supercluster.org
>
> You can reach the person managing the list at
>
>       torqueusers-owner at supercluster.org
>
> When replying, please edit your Subject line so it is more specific
>
> than "Re: Contents of torqueusers digest..."
>
> Today's Topics:
>
>    1. Re: Random SCP errors when transfering to/from  CREAM sandbox
>
>       (Christopher Samuel)
>
>    2. Re: Random SCP errors when transfering to/from  CREAM sandbox
>
>       (Gila Arrondo  Miguel Angel)
>
>    3. Parallel processing for MC code
>
>       (RB. Ezhilalan (Principal Physicist, CUH))
>
>    4. Re: Parallel processing for MC code (Jason Bacon)
>
>    5. Re: File staging syntax (Steve Traylen)
>
> ----------------------------------------------------------------------
>
> Message: 1
>
> Date: Thu, 17 Nov 2011 13:29:44 +1100
>
> From: Christopher Samuel <samuel at unimelb.edu.au>
>
> Subject: Re: [torqueusers] Random SCP errors when transfering to/from
>
>       CREAM sandbox
>
> To: torqueusers at supercluster.org
>
> Message-ID: <4EC47198.1040709 at unimelb.edu.au>
>
> Content-Type: text/plain; charset=ISO-8859-1
>
> -----BEGIN PGP SIGNED MESSAGE-----
>
> Hash: SHA1
>
> On 17/11/11 03:24, Gila Arrondo Miguel Angel wrote:
>
> > Many thanks for your answer. We've made sure that the
>
> > keys are okay, as well as disabling hoskeychecking to
>
> > test it.
>
> Can you try and scp as that user to see whether it
>
> complains about anything else ?
>
> It may be that it is prompting the user to accept a
>
> host key if they don't already have it.
>
> cheers,
>
> Chris
>
> - -- 
>
>     Christopher Samuel - Senior Systems Administrator
>
>  VLSCI - Victorian Life Sciences Computation Initiative
>
>  Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545
>
>          http://www.vlsci.unimelb.edu.au/
>
> -----BEGIN PGP SIGNATURE-----
>
> Version: GnuPG v1.4.11 (GNU/Linux)
>
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
>
> iEYEARECAAYFAk7EcZgACgkQO2KABBYQAh9K+ACfeFLepTpowIXW9CiK2ECr1IdW
>
> sgcAn0cIHr3JnJORTY4g2a/PcA/11fNS
>
> =VPqK
>
> -----END PGP SIGNATURE-----
>
> ------------------------------
>
> Message: 2
>
> Date: Thu, 17 Nov 2011 07:55:50 +0000
>
> From: "Gila Arrondo  Miguel Angel" <miguel.gila at cscs.ch>
>
> Subject: Re: [torqueusers] Random SCP errors when transfering to/from
>
>       CREAM sandbox
>
> To: Torque Users Mailing List <torqueusers at supercluster.org>
>
> Message-ID: <36DEB2B3-4C2B-4B95-8CE6-DFB1363A71EE at cscs.ch>
>
> Content-Type: text/plain; charset="us-ascii"
>
> Hi Chris,
>
> I've done that in many WNs and with different users, so I don't think 
> that is be the issue. I've also checked for scheduled tasks that 
> interact with the ssh keys, but the errors happen at random times, not 
> when the scheduled tasks run... :-S
>
> I'm running out of options here.
>
> Cheers,
>
> Miguel
>
> On Nov 17, 2011, at 3:29 AM, Christopher Samuel wrote:
>
> > -----BEGIN PGP SIGNED MESSAGE-----
>
> > Hash: SHA1
>
> >
>
> > On 17/11/11 03:24, Gila Arrondo Miguel Angel wrote:
>
> >
>
> >> Many thanks for your answer. We've made sure that the
>
> >> keys are okay, as well as disabling hoskeychecking to
>
> >> test it.
>
> >
>
> > Can you try and scp as that user to see whether it
>
> > complains about anything else ?
>
> >
>
> > It may be that it is prompting the user to accept a
>
> > host key if they don't already have it.
>
> >
>
> > cheers,
>
> > Chris
>
> > - --
>
> >    Christopher Samuel - Senior Systems Administrator
>
> > VLSCI - Victorian Life Sciences Computation Initiative
>
> > Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545
>
> >         http://www.vlsci.unimelb.edu.au/
>
> >
>
> > -----BEGIN PGP SIGNATURE-----
>
> > Version: GnuPG v1.4.11 (GNU/Linux)
>
> > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
>
> >
>
> > iEYEARECAAYFAk7EcZgACgkQO2KABBYQAh9K+ACfeFLepTpowIXW9CiK2ECr1IdW
>
> > sgcAn0cIHr3JnJORTY4g2a/PcA/11fNS
>
> > =VPqK
>
> > -----END PGP SIGNATURE-----
>
> > _______________________________________________
>
> > torqueusers mailing list
>
> > torqueusers at supercluster.org
>
> > http://www.supercluster.org/mailman/listinfo/torqueusers
>
> --
>
> Miguel Gila
>
> CSCS Swiss National Supercomputing Centre
>
> HPC Solutions
>
> Via Cantonale, Galleria 2 | CH-6928 Manno | Switzerland
>
> miguel.gila at cscs.ch | www.cscs.ch | Phone +41 91 610 82 22
>
> -------------- next part --------------
>
> A non-text attachment was scrubbed...
>
> Name: smime.p7s
>
> Type: application/pkcs7-signature
>
> Size: 3239 bytes
>
> Desc: not available
>
> Url : 
> http://www.supercluster.org/pipermail/torqueusers/attachments/20111117/214ea9d6/attachment-0001.bin
>
> ------------------------------
>
> Message: 3
>
> Date: Thu, 17 Nov 2011 10:14:32 -0000
>
> From: "RB. Ezhilalan (Principal Physicist, CUH)" <RB.Ezhilalan at hse.ie>
>
> Subject: [torqueusers] Parallel processing for MC code
>
> To: torqueusers at supercluster.org
>
> Message-ID:
>
> <DB0960F9D7310D4BA87B4985061511A703B30D96 at CKVEX004.south.health.local>
>
> Content-Type: text/plain; charset="us-ascii"
>
> Hi All,
>
> I've been trying to set up Torque queuing system on two SUSE10.1 linux
>
> PCs (PIII!).
>
> Installed the linux on both PCs, exported home directory containing
>
> BEAMnrc montecarlo code from PC1 to PC2 via NFS and set up SSH password
>
> less communication. All seems to be working fine.
>
> Downloaded latest version of Torque (number not handy) installed
>
> PBS_SERVER, PBS_MOM & PBS_SCHED on PC1 and PBS_MOM on PC2.
>
> PBS 'nodes' file was created as per guidelines, PBS_SERVER and QUEUE
>
> attributes were set as default.
>
> Pbsnodes -a command displays- two nodes (PC1 & PC2 and they are free. I
>
> am not sure whether this confirms PBS/Torque set up correctly.
>
> I was able to run an executable BEAMnrc user code in batch mode i.e
>
> using 'exb' command aliased to 'qsub' and sources a built in job script
>
> file with option p=1 (single job).
>
> To split the jobs in to two, so that it runs in parallel on the two PCs,
>
> option p=2 should be issued. However, what I noticed was, the job ran
>
> twice on the first PC (PC1) but not on both.
>
> I can't figure out what went wrong, I suspect PBS setup could have some
>
> issues, May be I can try running the job specifically on PC2 if so what
>
> command I need to give?
>
> I would be grateful for any advice!
>
> Kind Regards,
>
> Ezhilalan
>
> -------------- next part --------------
>
> An HTML attachment was scrubbed...
>
> URL: 
> http://www.supercluster.org/pipermail/torqueusers/attachments/20111117/06e4a798/attachment-0001.html 
>
>
> ------------------------------
>
> Message: 4
>
> Date: Thu, 17 Nov 2011 10:18:18 -0600
>
> From: Jason Bacon <jwbacon at tds.net>
>
> Subject: Re: [torqueusers] Parallel processing for MC code
>
> To: Torque Users Mailing List <torqueusers at supercluster.org>
>
> Message-ID: <4EC533CA.2000902 at tds.net>
>
> Content-Type: text/plain; charset=windows-1252; format=flowed
>
> How many cores does PC1 have? Note that Torque schedules cores, not
>
> computers, unless you specifically tell it to with resource requirements.
>
> Regards,
>
> -J
>
> On 11/17/11 04:14, RB. Ezhilalan (Principal Physicist, CUH) wrote:
>
> >
>
> > Hi All,
>
> >
>
> > I?ve been trying to set up Torque queuing system on two SUSE10.1 linux
>
> > PCs (PIII!).
>
> >
>
> > Installed the linux on both PCs, exported home directory containing
>
> > BEAMnrc montecarlo code from PC1 to PC2 via NFS and set up SSH
>
> > password less communication. All seems to be working fine.
>
> >
>
> > Downloaded latest version of Torque (number not handy) installed
>
> > PBS_SERVER, PBS_MOM & PBS_SCHED on PC1 and PBS_MOM on PC2.
>
> >
>
> > PBS ?nodes? file was created as per guidelines, PBS_SERVER and QUEUE
>
> > attributes were set as default.
>
> >
>
> > Pbsnodes ?a command displays- two nodes (PC1 & PC2 and they are free.
>
> > I am not sure whether this confirms PBS/Torque set up correctly.
>
> >
>
> > I was able to run an executable BEAMnrc user code in batch mode i.e
>
> > using ?exb? command aliased to ?qsub? and sources a built in job
>
> > script file with option p=1 (single job).
>
> >
>
> > To split the jobs in to two, so that it runs in parallel on the two
>
> > PCs, option p=2 should be issued. However, what I noticed was, the job
>
> > ran twice on the first PC (PC1) but not on both.
>
> >
>
> > I can?t figure out what went wrong, I suspect PBS setup could have
>
> > some issues, May be I can try running the job specifically on PC2 if
>
> > so what command I need to give?
>
> >
>
> > I would be grateful for any advice!
>
> >
>
> > Kind Regards,
>
> >
>
> > Ezhilalan
>
> >
>
> >
>
> > _______________________________________________
>
> > torqueusers mailing list
>
> > torqueusers at supercluster.org
>
> > http://www.supercluster.org/mailman/listinfo/torqueusers
>
> -- 
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> Jason W. Bacon
>
> jwbacon at tds.net
>
> http://personalpages.tds.net/~jwbacon
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> ------------------------------
>
> Message: 5
>
> Date: Thu, 17 Nov 2011 18:19:14 +0100
>
> From: Steve Traylen <steve.traylen at cern.ch>
>
> Subject: Re: [torqueusers] File staging syntax
>
> To: Torque Users Mailing List <torqueusers at supercluster.org>
>
> Message-ID:
>
> <CAOXEVSCY2CC-=ajKvcc6PAgKd5S6fupRkgNp79KL_w3=k2Xy1A at mail.gmail.com>
>
> Keywords: CERN SpamKiller Note: -50
>
> Content-Type: text/plain; charset="ISO-8859-1"
>
> On Thu, Sep 29, 2011 at 4:59 PM, Ken Nielson
>
> <knielson at adaptivecomputing.com> wrote>
>
> > Andr?,
>
> >
>
> > I have not yet had time to reproduce this. I did look through the 
> change log and there are two suspects. One is in 2.5.6, a fix for 
> Bugzilla 115 and the other is in 2.5.8, a fix for Bugzilla 133.
>
> >
>
> > That is as far as I am right now. I will try to get to this as soon 
> as I can.
>
> Hi Ken,
>
>  Did you manage to track this down. It's currently making upgrading a 
> pain.
>
> Steve.
>
> -- 
>
> Steve Traylen
>
> ------------------------------
>
> _______________________________________________
>
> torqueusers mailing list
>
> torqueusers at supercluster.org
>
> http://www.supercluster.org/mailman/listinfo/torqueusers
>
> End of torqueusers Digest, Vol 88, Issue 14
>
> *******************************************
>
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers


-- 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Jason W. Bacon
jwbacon at tds.net
http://personalpages.tds.net/~jwbacon
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~




More information about the torqueusers mailing list