[torqueusers] jobs terminated half way

Ricardo Román Brenes roman.ricardo at gmail.com
Thu Oct 31 07:16:42 MDT 2013


Is the cluster yours? Can you run the program outside torque? It's the
easiest way to know if it's torque or the program itself that aborted the
task.

Also can you print us your PBS_server configuration?
On Oct 31, 2013 4:52 AM, "RB. Ezhilalan (Principal Physicist, CUH)" <
RB.Ezhilalan at hse.ie> wrote:

> Hi Ricardo,
>
> Thank you for looking at the log files. I noticed that the jobs get
> terminated half way when the calculation time for each job is increased
> (i.e number of histories). Could the default memory allocation be the
> problem? I have used the default settings for the pbs_server. For your
> info I am running BEAmnrc montecarlo simulations. Any suggestions?
>
> Regards,
> Ezhil
> Ezhilalan Ramalingam M.Sc.,DABR.,
> Principal Physicist (Radiotherapy),
> Medical Physics Department,
> Cork University Hospital,
> Wilton, Cork
> Ireland
> Tel. 00353 21 4922533
> Fax.00353 21 4921300
> Email: rb.ezhilalan at hse.ie
> -----Original Message-----
> From: torqueusers-bounces at supercluster.org
> [mailto:torqueusers-bounces at supercluster.org] On Behalf Of
> torqueusers-request at supercluster.org
> Sent: 30 October 2013 18:59
> To: torqueusers at supercluster.org
> Subject: torqueusers Digest, Vol 111, Issue 39
>
> Send torqueusers mailing list submissions to
>         torqueusers at supercluster.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         http://www.supercluster.org/mailman/listinfo/torqueusers
> or, via email, send a message with subject or body 'help' to
>         torqueusers-request at supercluster.org
>
> You can reach the person managing the list at
>         torqueusers-owner at supercluster.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of torqueusers digest..."
>
>
> Today's Topics:
>
>    1. Re: jobs terminated half way (Ricardo Rom?n Brenes)
>    2. Re: Problem building rpms torque-2.5.13 (Michael Jennings)
>    3. Re: Problem building rpms torque-2.5.13 (Carles Acosta (PIC))
>    4. priority queue (Luca Nannipieri)
>    5. Re: priority queue (David Beer)
>    6. customizing xbpsmon (Eva Hocks)
>    7. TCL scheduler (Kevin Van Workum)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Tue, 29 Oct 2013 09:35:18 -0600
> From: Ricardo Rom?n Brenes <roman.ricardo at gmail.com>
> Subject: Re: [torqueusers] jobs terminated half way
> To: Torque Users Mailing List <torqueusers at supercluster.org>
> Message-ID:
>
> <CAG-vK_xU5vNFROLhgOxG8en=aK67eeEVZcJqVjkV7DyXOje9iQ at mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Hi again
>
> The only error i could read in the 6 logs you sent regarding those jobs
> was
> this:
>
> pbs_mom;Svr;pbs_mom;LOG_ERROR::Permission denied (13) in job_purge,
> Unlink
> of job file failed
>
> I am not sure if this is an actual error, just a error in the logging or
> if
> this "permission denied" should abort your jobs. Maybe check the workdir
> permissions.
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
> http://www.supercluster.org/pipermail/torqueusers/attachments/20131029/8
> dfc388f/attachment-0001.html
>
> ------------------------------
>
> Message: 2
> Date: Tue, 29 Oct 2013 09:54:17 -0700
> From: Michael Jennings <mej at lbl.gov>
> Subject: Re: [torqueusers] Problem building rpms torque-2.5.13
> To: torqueusers at supercluster.org
> Message-ID: <20131029165417.GA27774 at lbl.gov>
> Content-Type: text/plain; charset=us-ascii
>
> On Tuesday, 29 October 2013, at 15:40:04 (+0100),
> Carles Acosta wrote:
>
> > I am trying to build the rpms for the new torque 2.5.13 release.
> > After applying the patch fix_mom_priv_2.5.patch, I use the following
> > options:
> >
> > # rpmbuild -ta --with munge --with scp --define 'torque_home
> > /var/spool/pbs' --define 'torque_server XXXXXXX' --define 'acflags
> > --enable-maxdefault --with-readline --with-tcp-retry-limit=2
> > --disable-spool' torque-2.5.13.tar.gz
> >
> > The process fails with the error:
>
> This is a known issue which has already been fixed in Git.  Here's the
> mailing list thread from September:
>
> http://www.supercluster.org/pipermail/torquedev/2013-September/004587.ht
> ml
>
> Here's the pull request (with patch):
>
> https://github.com/adaptivecomputing/torque/pull/183
>
> Michael
>
> --
> Michael Jennings <mej at lbl.gov>
> Senior HPC Systems Engineer
> High-Performance Computing Services
> Lawrence Berkeley National Laboratory
> Bldg 50B-3209E        W: 510-495-2687
> MS 050B-3209          F: 510-486-8615
>
>
> ------------------------------
>
> Message: 3
> Date: Tue, 29 Oct 2013 21:03:45 +0100
> From: "Carles Acosta (PIC)" <cacosta at pic.es>
> Subject: Re: [torqueusers] Problem building rpms torque-2.5.13
> To: Torque Users Mailing List <torqueusers at supercluster.org>
> Cc: "torqueusers at supercluster.org" <torqueusers at supercluster.org>
> Message-ID: <AA390888-DD39-4BC8-805E-CFF6D2A61334 at pic.es>
> Content-Type: text/plain;       charset=us-ascii
>
> Hi Michael,
>
> Thank you very much!
>
> Regards,
>
> Carles
>
> El Oct 29, 2013, a les 5:54 PM, Michael Jennings <mej at lbl.gov> va
> escriure:
> > On Tuesday, 29 October 2013, at 15:40:04 (+0100),
> > Carles Acosta wrote:
> >
> >> I am trying to build the rpms for the new torque 2.5.13 release.
> >> After applying the patch fix_mom_priv_2.5.patch, I use the following
> >> options:
> >>
> >> # rpmbuild -ta --with munge --with scp --define 'torque_home
> >> /var/spool/pbs' --define 'torque_server XXXXXXX' --define 'acflags
> >> --enable-maxdefault --with-readline --with-tcp-retry-limit=2
> >> --disable-spool' torque-2.5.13.tar.gz
> >>
> >> The process fails with the error:
> >
> > This is a known issue which has already been fixed in Git.  Here's the
> > mailing list thread from September:
> >
> >
> http://www.supercluster.org/pipermail/torquedev/2013-September/004587.ht
> ml
> >
> > Here's the pull request (with patch):
> >
> > https://github.com/adaptivecomputing/torque/pull/183
> >
> > Michael
> >
> > --
> > Michael Jennings <mej at lbl.gov>
> > Senior HPC Systems Engineer
> > High-Performance Computing Services
> > Lawrence Berkeley National Laboratory
> > Bldg 50B-3209E        W: 510-495-2687
> > MS 050B-3209          F: 510-486-8615
> > _______________________________________________
> > torqueusers mailing list
> > torqueusers at supercluster.org
> > http://www.supercluster.org/mailman/listinfo/torqueusers
>
>
> ------------------------------
>
> Message: 4
> Date: Wed, 30 Oct 2013 16:00:26 +0100
> From: Luca Nannipieri <nannipieri at pi.ingv.it>
> Subject: [torqueusers] priority queue
> To: Torque Users Mailing List <torqueusers at supercluster.org>
> Message-ID: <52711F0A.2070407 at pi.ingv.it>
> Content-Type: text/plain; charset=ISO-8859-15; format=flowed
>
> I have 2 queues:
>
> [root@ ~]# qstat -Q -f
> Queue: default
>      queue_type = Execution
>      Priority = 50
>      total_jobs = 1
>      state_count = Transit:0 Queued:1 Held:0 Waiting:0 Running:0
> Exiting:0
>      mtime = 1383139362
>      resources_assigned.nodect = 0
>      enabled = True
>      started = True
>
> Queue: batch
>      queue_type = Execution
>      Priority = 20
>      total_jobs = 1
>      state_count = Transit:0 Queued:0 Held:0 Waiting:0 Running:1
> Exiting:0
>      mtime = 1383139335
>      resources_assigned.nodect = 1
>      enabled = True
>      started = True
>
> default has priority= 50 and batch prioriry=20, but if i submit a job
> with default queue the scheduler put in "queued" status even if there is
>
> a running job with batch queue and not running job with default queue
> and put in "queued" the job with batch queue.  Why?
>
> --
> Ing. Luca Nannipieri
> Istituto Nazionale di Geofisica e Vulcanologia
> Sezione di Pisa
> Via della Faggiola, 32 - 56126 Pisa - Italy
> Tel. +39 050 8311926
> fax: +39 050 8311942
> http://www.pi.ingv.it/chisiamo/paginepersonali/nannipieri.html
> PEC: aoo.pisa at pec.ingv.it
> ----------------------------------------------------------------
>
> Il contenuto di questa e-mail e' rivolto unicamente alle persone
> cui e' indirizzato, e puo'contenere informazioni la cui riservatezza
> e' tutelata.E' proibita la copia, la divulgazione o l'uso di questo
> messaggio o dell'informazione ivi contenuta da chiunque altro che
> non sia il destinatario indicato. Se avete ricevuto questa e-mail
> per errore, vogliate cortesemente comunicarlo immediatamente per
> telefono, fax o e-mail.
> Grazie.
>
> This e-mail is intended only for person or entity to which is
> addressed and may contain information that is privileged, confidential
> or otherwise protected from disclosure. Copying, dissemination or use
> of this e-mail or the information herein by anyone other than the
> intended recipient is prohibited. If you have received this e-mail
> by mistake, please notify us immediately by telephone, fax or e-mail.
> Thank you.
>
>
>
>
>
> ------------------------------
>
> Message: 5
> Date: Wed, 30 Oct 2013 09:52:46 -0600
> From: David Beer <dbeer at adaptivecomputing.com>
> Subject: Re: [torqueusers] priority queue
> To: Torque Users Mailing List <torqueusers at supercluster.org>
> Message-ID:
>
> <CAFUQeZ1z-D9o23V69_bAue0svZ2dc7H0XsZjKh8zeWf=Opk0iA at mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Luca,
>
> The priority assigned by the queue is meant to be interpreted by the
> scheduler you are using. Usually, having two jobs where job1 has a
> priority
> of 20 and job2 has a priority of 50 means that both jobs are eligible to
> run, but job1 should be evaluated to run before job2 (or the other way
> if
> your scheduler things lower priority numbers run first or higher).
>
> In other words, the state of queued simply means the job is eligible to
> be
> run. Two jobs having the same state doesn't mean that they are equal
> priority for running.
>
> HTH
>
> David
>
>
> On Wed, Oct 30, 2013 at 9:00 AM, Luca Nannipieri
> <nannipieri at pi.ingv.it>wrote:
>
> > I have 2 queues:
> >
> > [root@ ~]# qstat -Q -f
> > Queue: default
> >      queue_type = Execution
> >      Priority = 50
> >      total_jobs = 1
> >      state_count = Transit:0 Queued:1 Held:0 Waiting:0 Running:0
> Exiting:0
> >      mtime = 1383139362
> >      resources_assigned.nodect = 0
> >      enabled = True
> >      started = True
> >
> > Queue: batch
> >      queue_type = Execution
> >      Priority = 20
> >      total_jobs = 1
> >      state_count = Transit:0 Queued:0 Held:0 Waiting:0 Running:1
> Exiting:0
> >      mtime = 1383139335
> >      resources_assigned.nodect = 1
> >      enabled = True
> >      started = True
> >
> > default has priority= 50 and batch prioriry=20, but if i submit a job
> > with default queue the scheduler put in "queued" status even if there
> is
> > a running job with batch queue and not running job with default queue
> > and put in "queued" the job with batch queue.  Why?
> >
> > --
> > Ing. Luca Nannipieri
> > Istituto Nazionale di Geofisica e Vulcanologia
> > Sezione di Pisa
> > Via della Faggiola, 32 - 56126 Pisa - Italy
> > Tel. +39 050 8311926
> > fax: +39 050 8311942
> > http://www.pi.ingv.it/chisiamo/paginepersonali/nannipieri.html
> > PEC: aoo.pisa at pec.ingv.it
> > ----------------------------------------------------------------
> >
> > Il contenuto di questa e-mail e' rivolto unicamente alle persone
> > cui e' indirizzato, e puo'contenere informazioni la cui riservatezza
> > e' tutelata.E' proibita la copia, la divulgazione o l'uso di questo
> > messaggio o dell'informazione ivi contenuta da chiunque altro che
> > non sia il destinatario indicato. Se avete ricevuto questa e-mail
> > per errore, vogliate cortesemente comunicarlo immediatamente per
> > telefono, fax o e-mail.
> > Grazie.
> >
> > This e-mail is intended only for person or entity to which is
> > addressed and may contain information that is privileged, confidential
> > or otherwise protected from disclosure. Copying, dissemination or use
> > of this e-mail or the information herein by anyone other than the
> > intended recipient is prohibited. If you have received this e-mail
> > by mistake, please notify us immediately by telephone, fax or e-mail.
> > Thank you.
> >
> >
> >
> > _______________________________________________
> > torqueusers mailing list
> > torqueusers at supercluster.org
> > http://www.supercluster.org/mailman/listinfo/torqueusers
> >
>
>
>
> --
> David Beer | Senior Software Engineer
> Adaptive Computing
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
> http://www.supercluster.org/pipermail/torqueusers/attachments/20131030/b
> bf881b6/attachment-0001.html
>
> ------------------------------
>
> Message: 6
> Date: Wed, 30 Oct 2013 10:50:29 -0700 (PDT)
> From: Eva Hocks <hocks at sdsc.edu>
> Subject: [torqueusers] customizing xbpsmon
> To: Torque Users Mailing List <torqueusers at supercluster.org>
> Message-ID:
>         <Pine.GSO.4.30.1310301038510.7397-100000 at multivac.sdsc.edu>
> Content-Type: TEXT/PLAIN; charset=US-ASCII
>
>
> Anybody using xpbsmon? I would like to change the size of the cluster
> frame.
>
> I changed the height in the xpbsmonrc without success.
>
> *nodeBoxFullMaxHeight:  1000
> *nodeBoxMirrorMaxHeight:        1000
> *serverBoxMaxHeight:    1000
> *siteBoxMaxHeight:      1000
>
>
> I also tried to chage the same variable in the xpbsmon script with the
> same result.
>
> Any help appreciated
> Thanks
> Eva
>
>
>
> ------------------------------
>
> Message: 7
> Date: Wed, 30 Oct 2013 14:48:29 -0400
> From: Kevin Van Workum <vanw at sabalcore.com>
> Subject: [torqueusers] TCL scheduler
> To: Torque Users Mailing List <torqueusers at supercluster.org>
> Message-ID:
>
> <CAHom8ysghfYWq9TB8Si-cHw2StToD=qJTdQ7PMztB_LFcr+6Mw at mail.gmail.com>
> Content-Type: text/plain; charset="us-ascii"
>
> I'm curious if the TCL scheduler still supported in 4.2.x? Trying to
> build
> it throws lots of errors.
>
> --
> Kevin Van Workum, PhD
> Sabalcore Computing Inc.
> "Where Data Becomes Discovery"
> http://www.sabalcore.com
> 877-492-8027 ext. 11
>
> --
>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
> http://www.supercluster.org/pipermail/torqueusers/attachments/20131030/2
> fe5a4db/attachment.html
>
> ------------------------------
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>
>
> End of torqueusers Digest, Vol 111, Issue 39
> ********************************************
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20131031/8d340e0f/attachment-0001.html 


More information about the torqueusers mailing list