[torqueusers] jobs terminated half way

RB. Ezhilalan (Principal Physicist, CUH) RB.Ezhilalan at hse.ie
Thu Oct 31 04:50:42 MDT 2013


Hi Ricardo,

Thank you for looking at the log files. I noticed that the jobs get
terminated half way when the calculation time for each job is increased
(i.e number of histories). Could the default memory allocation be the
problem? I have used the default settings for the pbs_server. For your
info I am running BEAmnrc montecarlo simulations. Any suggestions?

Regards,
Ezhil
Ezhilalan Ramalingam M.Sc.,DABR.,
Principal Physicist (Radiotherapy),
Medical Physics Department,
Cork University Hospital,
Wilton, Cork
Ireland
Tel. 00353 21 4922533
Fax.00353 21 4921300
Email: rb.ezhilalan at hse.ie 
-----Original Message-----
From: torqueusers-bounces at supercluster.org
[mailto:torqueusers-bounces at supercluster.org] On Behalf Of
torqueusers-request at supercluster.org
Sent: 30 October 2013 18:59
To: torqueusers at supercluster.org
Subject: torqueusers Digest, Vol 111, Issue 39

Send torqueusers mailing list submissions to
	torqueusers at supercluster.org

To subscribe or unsubscribe via the World Wide Web, visit
	http://www.supercluster.org/mailman/listinfo/torqueusers
or, via email, send a message with subject or body 'help' to
	torqueusers-request at supercluster.org

You can reach the person managing the list at
	torqueusers-owner at supercluster.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of torqueusers digest..."


Today's Topics:

   1. Re: jobs terminated half way (Ricardo Rom?n Brenes)
   2. Re: Problem building rpms torque-2.5.13 (Michael Jennings)
   3. Re: Problem building rpms torque-2.5.13 (Carles Acosta (PIC))
   4. priority queue (Luca Nannipieri)
   5. Re: priority queue (David Beer)
   6. customizing xbpsmon (Eva Hocks)
   7. TCL scheduler (Kevin Van Workum)


----------------------------------------------------------------------

Message: 1
Date: Tue, 29 Oct 2013 09:35:18 -0600
From: Ricardo Rom?n Brenes <roman.ricardo at gmail.com>
Subject: Re: [torqueusers] jobs terminated half way
To: Torque Users Mailing List <torqueusers at supercluster.org>
Message-ID:
	
<CAG-vK_xU5vNFROLhgOxG8en=aK67eeEVZcJqVjkV7DyXOje9iQ at mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

Hi again

The only error i could read in the 6 logs you sent regarding those jobs
was
this:

pbs_mom;Svr;pbs_mom;LOG_ERROR::Permission denied (13) in job_purge,
Unlink
of job file failed

I am not sure if this is an actual error, just a error in the logging or
if
this "permission denied" should abort your jobs. Maybe check the workdir
permissions.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://www.supercluster.org/pipermail/torqueusers/attachments/20131029/8
dfc388f/attachment-0001.html 

------------------------------

Message: 2
Date: Tue, 29 Oct 2013 09:54:17 -0700
From: Michael Jennings <mej at lbl.gov>
Subject: Re: [torqueusers] Problem building rpms torque-2.5.13
To: torqueusers at supercluster.org
Message-ID: <20131029165417.GA27774 at lbl.gov>
Content-Type: text/plain; charset=us-ascii

On Tuesday, 29 October 2013, at 15:40:04 (+0100),
Carles Acosta wrote:

> I am trying to build the rpms for the new torque 2.5.13 release.
> After applying the patch fix_mom_priv_2.5.patch, I use the following
> options:
> 
> # rpmbuild -ta --with munge --with scp --define 'torque_home
> /var/spool/pbs' --define 'torque_server XXXXXXX' --define 'acflags
> --enable-maxdefault --with-readline --with-tcp-retry-limit=2
> --disable-spool' torque-2.5.13.tar.gz
> 
> The process fails with the error:

This is a known issue which has already been fixed in Git.  Here's the
mailing list thread from September:

http://www.supercluster.org/pipermail/torquedev/2013-September/004587.ht
ml

Here's the pull request (with patch):

https://github.com/adaptivecomputing/torque/pull/183

Michael

-- 
Michael Jennings <mej at lbl.gov>
Senior HPC Systems Engineer
High-Performance Computing Services
Lawrence Berkeley National Laboratory
Bldg 50B-3209E        W: 510-495-2687
MS 050B-3209          F: 510-486-8615


------------------------------

Message: 3
Date: Tue, 29 Oct 2013 21:03:45 +0100
From: "Carles Acosta (PIC)" <cacosta at pic.es>
Subject: Re: [torqueusers] Problem building rpms torque-2.5.13
To: Torque Users Mailing List <torqueusers at supercluster.org>
Cc: "torqueusers at supercluster.org" <torqueusers at supercluster.org>
Message-ID: <AA390888-DD39-4BC8-805E-CFF6D2A61334 at pic.es>
Content-Type: text/plain;	charset=us-ascii

Hi Michael,

Thank you very much!

Regards,

Carles 

El Oct 29, 2013, a les 5:54 PM, Michael Jennings <mej at lbl.gov> va
escriure:
> On Tuesday, 29 October 2013, at 15:40:04 (+0100),
> Carles Acosta wrote:
> 
>> I am trying to build the rpms for the new torque 2.5.13 release.
>> After applying the patch fix_mom_priv_2.5.patch, I use the following
>> options:
>> 
>> # rpmbuild -ta --with munge --with scp --define 'torque_home
>> /var/spool/pbs' --define 'torque_server XXXXXXX' --define 'acflags
>> --enable-maxdefault --with-readline --with-tcp-retry-limit=2
>> --disable-spool' torque-2.5.13.tar.gz
>> 
>> The process fails with the error:
> 
> This is a known issue which has already been fixed in Git.  Here's the
> mailing list thread from September:
> 
>
http://www.supercluster.org/pipermail/torquedev/2013-September/004587.ht
ml
> 
> Here's the pull request (with patch):
> 
> https://github.com/adaptivecomputing/torque/pull/183
> 
> Michael
> 
> -- 
> Michael Jennings <mej at lbl.gov>
> Senior HPC Systems Engineer
> High-Performance Computing Services
> Lawrence Berkeley National Laboratory
> Bldg 50B-3209E        W: 510-495-2687
> MS 050B-3209          F: 510-486-8615
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers


------------------------------

Message: 4
Date: Wed, 30 Oct 2013 16:00:26 +0100
From: Luca Nannipieri <nannipieri at pi.ingv.it>
Subject: [torqueusers] priority queue
To: Torque Users Mailing List <torqueusers at supercluster.org>
Message-ID: <52711F0A.2070407 at pi.ingv.it>
Content-Type: text/plain; charset=ISO-8859-15; format=flowed

I have 2 queues:

[root@ ~]# qstat -Q -f
Queue: default
     queue_type = Execution
     Priority = 50
     total_jobs = 1
     state_count = Transit:0 Queued:1 Held:0 Waiting:0 Running:0
Exiting:0
     mtime = 1383139362
     resources_assigned.nodect = 0
     enabled = True
     started = True

Queue: batch
     queue_type = Execution
     Priority = 20
     total_jobs = 1
     state_count = Transit:0 Queued:0 Held:0 Waiting:0 Running:1
Exiting:0
     mtime = 1383139335
     resources_assigned.nodect = 1
     enabled = True
     started = True

default has priority= 50 and batch prioriry=20, but if i submit a job 
with default queue the scheduler put in "queued" status even if there is

a running job with batch queue and not running job with default queue 
and put in "queued" the job with batch queue.  Why?

-- 
Ing. Luca Nannipieri
Istituto Nazionale di Geofisica e Vulcanologia
Sezione di Pisa
Via della Faggiola, 32 - 56126 Pisa - Italy
Tel. +39 050 8311926
fax: +39 050 8311942
http://www.pi.ingv.it/chisiamo/paginepersonali/nannipieri.html
PEC: aoo.pisa at pec.ingv.it
----------------------------------------------------------------

Il contenuto di questa e-mail e' rivolto unicamente alle persone
cui e' indirizzato, e puo'contenere informazioni la cui riservatezza
e' tutelata.E' proibita la copia, la divulgazione o l'uso di questo
messaggio o dell'informazione ivi contenuta da chiunque altro che
non sia il destinatario indicato. Se avete ricevuto questa e-mail
per errore, vogliate cortesemente comunicarlo immediatamente per
telefono, fax o e-mail.
Grazie.

This e-mail is intended only for person or entity to which is
addressed and may contain information that is privileged, confidential
or otherwise protected from disclosure. Copying, dissemination or use
of this e-mail or the information herein by anyone other than the
intended recipient is prohibited. If you have received this e-mail
by mistake, please notify us immediately by telephone, fax or e-mail.
Thank you.





------------------------------

Message: 5
Date: Wed, 30 Oct 2013 09:52:46 -0600
From: David Beer <dbeer at adaptivecomputing.com>
Subject: Re: [torqueusers] priority queue
To: Torque Users Mailing List <torqueusers at supercluster.org>
Message-ID:
	
<CAFUQeZ1z-D9o23V69_bAue0svZ2dc7H0XsZjKh8zeWf=Opk0iA at mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

Luca,

The priority assigned by the queue is meant to be interpreted by the
scheduler you are using. Usually, having two jobs where job1 has a
priority
of 20 and job2 has a priority of 50 means that both jobs are eligible to
run, but job1 should be evaluated to run before job2 (or the other way
if
your scheduler things lower priority numbers run first or higher).

In other words, the state of queued simply means the job is eligible to
be
run. Two jobs having the same state doesn't mean that they are equal
priority for running.

HTH

David


On Wed, Oct 30, 2013 at 9:00 AM, Luca Nannipieri
<nannipieri at pi.ingv.it>wrote:

> I have 2 queues:
>
> [root@ ~]# qstat -Q -f
> Queue: default
>      queue_type = Execution
>      Priority = 50
>      total_jobs = 1
>      state_count = Transit:0 Queued:1 Held:0 Waiting:0 Running:0
Exiting:0
>      mtime = 1383139362
>      resources_assigned.nodect = 0
>      enabled = True
>      started = True
>
> Queue: batch
>      queue_type = Execution
>      Priority = 20
>      total_jobs = 1
>      state_count = Transit:0 Queued:0 Held:0 Waiting:0 Running:1
Exiting:0
>      mtime = 1383139335
>      resources_assigned.nodect = 1
>      enabled = True
>      started = True
>
> default has priority= 50 and batch prioriry=20, but if i submit a job
> with default queue the scheduler put in "queued" status even if there
is
> a running job with batch queue and not running job with default queue
> and put in "queued" the job with batch queue.  Why?
>
> --
> Ing. Luca Nannipieri
> Istituto Nazionale di Geofisica e Vulcanologia
> Sezione di Pisa
> Via della Faggiola, 32 - 56126 Pisa - Italy
> Tel. +39 050 8311926
> fax: +39 050 8311942
> http://www.pi.ingv.it/chisiamo/paginepersonali/nannipieri.html
> PEC: aoo.pisa at pec.ingv.it
> ----------------------------------------------------------------
>
> Il contenuto di questa e-mail e' rivolto unicamente alle persone
> cui e' indirizzato, e puo'contenere informazioni la cui riservatezza
> e' tutelata.E' proibita la copia, la divulgazione o l'uso di questo
> messaggio o dell'informazione ivi contenuta da chiunque altro che
> non sia il destinatario indicato. Se avete ricevuto questa e-mail
> per errore, vogliate cortesemente comunicarlo immediatamente per
> telefono, fax o e-mail.
> Grazie.
>
> This e-mail is intended only for person or entity to which is
> addressed and may contain information that is privileged, confidential
> or otherwise protected from disclosure. Copying, dissemination or use
> of this e-mail or the information herein by anyone other than the
> intended recipient is prohibited. If you have received this e-mail
> by mistake, please notify us immediately by telephone, fax or e-mail.
> Thank you.
>
>
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>



-- 
David Beer | Senior Software Engineer
Adaptive Computing
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://www.supercluster.org/pipermail/torqueusers/attachments/20131030/b
bf881b6/attachment-0001.html 

------------------------------

Message: 6
Date: Wed, 30 Oct 2013 10:50:29 -0700 (PDT)
From: Eva Hocks <hocks at sdsc.edu>
Subject: [torqueusers] customizing xbpsmon
To: Torque Users Mailing List <torqueusers at supercluster.org>
Message-ID:
	<Pine.GSO.4.30.1310301038510.7397-100000 at multivac.sdsc.edu>
Content-Type: TEXT/PLAIN; charset=US-ASCII


Anybody using xpbsmon? I would like to change the size of the cluster
frame.

I changed the height in the xpbsmonrc without success.

*nodeBoxFullMaxHeight:  1000
*nodeBoxMirrorMaxHeight:        1000
*serverBoxMaxHeight:    1000
*siteBoxMaxHeight:      1000


I also tried to chage the same variable in the xpbsmon script with the
same result.

Any help appreciated
Thanks
Eva



------------------------------

Message: 7
Date: Wed, 30 Oct 2013 14:48:29 -0400
From: Kevin Van Workum <vanw at sabalcore.com>
Subject: [torqueusers] TCL scheduler
To: Torque Users Mailing List <torqueusers at supercluster.org>
Message-ID:
	
<CAHom8ysghfYWq9TB8Si-cHw2StToD=qJTdQ7PMztB_LFcr+6Mw at mail.gmail.com>
Content-Type: text/plain; charset="us-ascii"

I'm curious if the TCL scheduler still supported in 4.2.x? Trying to
build
it throws lots of errors.

-- 
Kevin Van Workum, PhD
Sabalcore Computing Inc.
"Where Data Becomes Discovery"
http://www.sabalcore.com
877-492-8027 ext. 11

-- 

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://www.supercluster.org/pipermail/torqueusers/attachments/20131030/2
fe5a4db/attachment.html 

------------------------------

_______________________________________________
torqueusers mailing list
torqueusers at supercluster.org
http://www.supercluster.org/mailman/listinfo/torqueusers


End of torqueusers Digest, Vol 111, Issue 39
********************************************


More information about the torqueusers mailing list