[torqueusers] jobs terminated half way

Jim Prewett download at hpc.unm.edu
Thu Oct 31 09:07:48 MDT 2013


Hello,

Have you tried running the job as an interactive job?  I find that 
interactive jobs are good ways to debug problems with running your codes.

HTH,
Jim

On Thu, 31 Oct 2013, RB. Ezhilalan (Principal Physicist, CUH) wrote:

>
> Hi Ricardo,
>
> Please see below answers to your questions:
>
> "Is the cluster yours? Can you run the program outside torque? It's the
> easiest way to know if it's torque or the program itself that aborted
> the
> task."
>
> Yes, this mini cluster is solely ours. I ran the calculations on a
> single PC without problem i.e interactively without involving Torque.
>
> Also can you print us your PBS_server configuration?
>
> I have printed below the pbs_server config-I hope this is the right way
> to print the configuration. There are total 6 PC's with one PC having
> dual core.
> Thanks again for your help.
> *****************************************************
> ezhil at linux-01:~/egsnrc_mp/dosxyznrc> qmgr -c 'p s'
> #
> # Create queues and set their attributes.
> #
> #
> # Create and define queue long
> #
> create queue long
> set queue long queue_type = Execution
> set queue long resources_default.ncpus = 7
> set queue long resources_default.nodes = 1
> set queue long resources_default.walltime = 120:00:00
> set queue long enabled = True
> set queue long started = True
> #
> # Create and define queue batch
> #
> create queue batch
> set queue batch queue_type = Execution
> set queue batch resources_min.ncpus = 7
> set queue batch resources_default.nodes = 1
> set queue batch resources_default.walltime = 100:00:00
> set queue batch enabled = True
> set queue batch started = True
> #
> # Create and define queue short
> #
> create queue short
> set queue short queue_type = Execution
> set queue short resources_default.ncpus = 7
> set queue short resources_default.nodes = 1
> set queue short resources_default.walltime = 03:00:00
> set queue short enabled = True
> set queue short started = True
> #
> # Create and define queue medium
> #
> create queue medium
> set queue medium queue_type = Execution
> set queue medium resources_default.ncpus = 7
> set queue medium resources_default.nodes = 6
> set queue medium resources_default.walltime = 15:00:00
> set queue medium enabled = False
> set queue medium started = False
> #
> # Set server attributes.
> #
> set server scheduling = True
> set server acl_hosts = linux-01
> set server managers = ezhil at linux-01.physics
> set server operators = ezhil at linux-01.physics
> set server default_queue = batch
> set server log_events = 511
> set server mail_from = adm
> set server scheduler_iteration = 600
> set server node_check_rate = 150
> set server tcp_timeout = 6
> set server mom_job_sync = True
> set server keep_completed = 300
> set server auto_node_np = True
> set server next_job_number = 2108
> ezhil at linux-01:~/egsnrc_mp/dosxyznrc>
> ************************************************************************
> Ezhilalan Ramalingam M.Sc.,DABR.,
> Principal Physicist (Radiotherapy),
> Medical Physics Department,
> Cork University Hospital,
> Wilton, Cork
> Ireland
> Tel. 00353 21 4922533
> Fax.00353 21 4921300
> Email: rb.ezhilalan at hse.ie
>
> -----Original Message-----
> From: torqueusers-bounces at supercluster.org
> [mailto:torqueusers-bounces at supercluster.org] On Behalf Of
> torqueusers-request at supercluster.org
> Sent: 31 October 2013 13:27
> To: torqueusers at supercluster.org
> Subject: torqueusers Digest, Vol 111, Issue 40
>
> Send torqueusers mailing list submissions to
> 	torqueusers at supercluster.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> 	http://www.supercluster.org/mailman/listinfo/torqueusers
> or, via email, send a message with subject or body 'help' to
> 	torqueusers-request at supercluster.org
>
> You can reach the person managing the list at
> 	torqueusers-owner at supercluster.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of torqueusers digest..."
>
>
> Today's Topics:
>
>   1. Re: Require route queue (Ken Nielson)
>   2. jobs terminated half way
>      (RB. Ezhilalan (Principal Physicist, CUH))
>   3. Re: jobs terminated half way (Ricardo Rom?n Brenes)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Wed, 30 Oct 2013 14:02:56 -0600
> From: Ken Nielson <knielson at adaptivecomputing.com>
> Subject: Re: [torqueusers] Require route queue
> To: Torque Users Mailing List <torqueusers at supercluster.org>
> Message-ID:
>
> <CADvLK3dmvaDAak4TK+3qqep0YpHzPvP7Tan625q=47L251itVA at mail.gmail.com>
> Content-Type: text/plain; charset="windows-1252"
>
> I am going to look for that one too.
>
>
> On Fri, Oct 25, 2013 at 3:02 PM, Andrus, Brian Contractor
> <bdandrus at nps.edu>wrote:
>
>>  Thanks.****
>>
>> Wow. Why couldn?t I find that?. Must be Friday.****
>>
>> Now I can deal with these users that are sneaking around specifying
>> walltimes to try and get unlimited time.****
>>
>> ** **
>>
>> ** **
>>
>> Brian Andrus****
>>
>> ITACS/Research Computing****
>>
>> Naval Postgraduate School****
>>
>> Monterey, California****
>>
>> voice: 831-656-6238****
>>
>> ** **
>>
>> ** **
>>
>> ** **
>>
>> *From:* torqueusers-bounces at supercluster.org [mailto:
>> torqueusers-bounces at supercluster.org] *On Behalf Of *Matt Britt
>> *Sent:* Friday, October 25, 2013 1:47 PM
>> *To:* Torque Users Mailing List
>> *Subject:* Re: [torqueusers] Require route queue****
>>
>> ** **
>>
>> I haven't tested it, but in the queue attributes man page, there is
> the '
>> from_route_only' attribute.****
>>
>> ** **
>>
>>  - Matt****
>>
>> ** **
>>
>>
>> ****
>>
>> --------------------------------------------****
>>
>> Matthew Britt****
>>
>> CAEN HPC Group - College of Engineering****
>>
>> msbritt at umich.edu****
>>
>>  ****
>>
>> ** **
>>
>> On Fri, Oct 25, 2013 at 4:41 PM, Andrus, Brian Contractor <
>> bdandrus at nps.edu> wrote:****
>>
>> All,
>>
>> Is there a way to have a queue ONLY  allow jobs that are coming from a
>> routing queue?
>>
>>
>> Brian Andrus
>> ITACS/Research Computing
>> Naval Postgraduate School
>> Monterey, California
>> voice: 831-656-6238
>>
>>
>> _______________________________________________
>> torqueusers mailing list
>> torqueusers at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/torqueusers****
>>
>> ** **
>>
>> _______________________________________________
>> torqueusers mailing list
>> torqueusers at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/torqueusers
>>
>>
>
>
> -- 
> Ken Nielson
> +1 801.717.3700 office +1 801.717.3738 fax
> 1712 S. East Bay Blvd, Suite 300  Provo, UT  84606
> www.adaptivecomputing.com
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
> http://www.supercluster.org/pipermail/torqueusers/attachments/20131030/8
> 3ce12e6/attachment-0001.html
>
> ------------------------------
>
> Message: 2
> Date: Thu, 31 Oct 2013 10:50:42 -0000
> From: "RB. Ezhilalan (Principal Physicist, CUH)" <RB.Ezhilalan at hse.ie>
> Subject: [torqueusers] jobs terminated half way
> To: <torqueusers at supercluster.org>
> Message-ID:
>
> <4659DE6B4825AD4F908C85260F0F2195273507 at ckvex001.south.health.local>
> Content-Type: text/plain;	charset="us-ascii"
>
> Hi Ricardo,
>
> Thank you for looking at the log files. I noticed that the jobs get
> terminated half way when the calculation time for each job is increased
> (i.e number of histories). Could the default memory allocation be the
> problem? I have used the default settings for the pbs_server. For your
> info I am running BEAmnrc montecarlo simulations. Any suggestions?
>
> Regards,
> Ezhil
> Ezhilalan Ramalingam M.Sc.,DABR.,
> Principal Physicist (Radiotherapy),
> Medical Physics Department,
> Cork University Hospital,
> Wilton, Cork
> Ireland
> Tel. 00353 21 4922533
> Fax.00353 21 4921300
> Email: rb.ezhilalan at hse.ie
> -----Original Message-----
> From: torqueusers-bounces at supercluster.org
> [mailto:torqueusers-bounces at supercluster.org] On Behalf Of
> torqueusers-request at supercluster.org
> Sent: 30 October 2013 18:59
> To: torqueusers at supercluster.org
> Subject: torqueusers Digest, Vol 111, Issue 39
>
> Send torqueusers mailing list submissions to
> 	torqueusers at supercluster.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> 	http://www.supercluster.org/mailman/listinfo/torqueusers
> or, via email, send a message with subject or body 'help' to
> 	torqueusers-request at supercluster.org
>
> You can reach the person managing the list at
> ------------------------------
>
> Message: 1
> Date: Tue, 29 Oct 2013 09:35:18 -0600
> From: Ricardo Rom?n Brenes <roman.ricardo at gmail.com>
> Subject: Re: [torqueusers] jobs terminated half way
> To: Torque Users Mailing List <torqueusers at supercluster.org>
> Message-ID:
>
> <CAG-vK_xU5vNFROLhgOxG8en=aK67eeEVZcJqVjkV7DyXOje9iQ at mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Hi again
>
> The only error i could read in the 6 logs you sent regarding those jobs
> was
> this:
>
> pbs_mom;Svr;pbs_mom;LOG_ERROR::Permission denied (13) in job_purge,
> Unlink
> of job file failed
>
> I am not sure if this is an actual error, just a error in the logging or
> if
> this "permission denied" should abort your jobs. Maybe check the workdir
> permissions.
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
> http://www.supercluster.org/pipermail/torqueusers/attachments/20131029/8
> dfc388f/attachment-0001.html
>
> ------------------------------
>
> Message: 2
> Date: Tue, 29 Oct 2013 09:54:17 -0700
> From: Michael Jennings <mej at lbl.gov>
> Subject: Re: [torqueusers] Problem building rpms torque-2.5.13
> To: torqueusers at supercluster.org
> Message-ID: <20131029165417.GA27774 at lbl.gov>
> Content-Type: text/plain; charset=us-ascii
>
> On Tuesday, 29 October 2013, at 15:40:04 (+0100),
> Carles Acosta wrote:
>
>> I am trying to build the rpms for the new torque 2.5.13 release.
>> After applying the patch fix_mom_priv_2.5.patch, I use the following
>> options:
>>
>> # rpmbuild -ta --with munge --with scp --define 'torque_home
>> /var/spool/pbs' --define 'torque_server XXXXXXX' --define 'acflags
>> --enable-maxdefault --with-readline --with-tcp-retry-limit=2
>> --disable-spool' torque-2.5.13.tar.gz
>>
>> The process fails with the error:
>
> This is a known issue which has already been fixed in Git.  Here's the
> mailing list thread from September:
>
> http://www.supercluster.org/pipermail/torquedev/2013-September/004587.ht
> ml
>
> Here's the pull request (with patch):
>
> https://github.com/adaptivecomputing/torque/pull/183
>
> Michael
>
> -- 
> Michael Jennings <mej at lbl.gov>
> Senior HPC Systems Engineer
> High-Performance Computing Services
> Lawrence Berkeley National Laboratory
> Bldg 50B-3209E        W: 510-495-2687
> MS 050B-3209          F: 510-486-8615
>
>
> ------------------------------
>
> Message: 3
> Date: Tue, 29 Oct 2013 21:03:45 +0100
> From: "Carles Acosta (PIC)" <cacosta at pic.es>
> Subject: Re: [torqueusers] Problem building rpms torque-2.5.13
> To: Torque Users Mailing List <torqueusers at supercluster.org>
> Cc: "torqueusers at supercluster.org" <torqueusers at supercluster.org>
> Message-ID: <AA390888-DD39-4BC8-805E-CFF6D2A61334 at pic.es>
> Content-Type: text/plain;	charset=us-ascii
>
> Hi Michael,
>
> Thank you very much!
>
> Regards,
>
> Carles
>
> El Oct 29, 2013, a les 5:54 PM, Michael Jennings <mej at lbl.gov> va
> escriure:
>> On Tuesday, 29 October 2013, at 15:40:04 (+0100),
>> Carles Acosta wrote:
>>
>>> I am trying to build the rpms for the new torque 2.5.13 release.
>>> After applying the patch fix_mom_priv_2.5.patch, I use the following
>>> options:
>>>
>>> # rpmbuild -ta --with munge --with scp --define 'torque_home
>>> /var/spool/pbs' --define 'torque_server XXXXXXX' --define 'acflags
>>> --enable-maxdefault --with-readline --with-tcp-retry-limit=2
>>> --disable-spool' torque-2.5.13.tar.gz
>>>
>>> The process fails with the error:
>>
>> This is a known issue which has already been fixed in Git.  Here's the
>> mailing list thread from September:
>>
>>
> http://www.supercluster.org/pipermail/torquedev/2013-September/004587.ht
> ml
>>
>> Here's the pull request (with patch):
>>
>> https://github.com/adaptivecomputing/torque/pull/183
>>
>> Michael
>>
>> --
>> Michael Jennings <mej at lbl.gov>
>> Senior HPC Systems Engineer
>> High-Performance Computing Services
>> Lawrence Berkeley National Laboratory
>> Bldg 50B-3209E        W: 510-495-2687
>> MS 050B-3209          F: 510-486-8615
>> _______________________________________________
>> torqueusers mailing list
>> torqueusers at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/torqueusers
>
>
> ------------------------------
>
> Message: 4
> Date: Wed, 30 Oct 2013 16:00:26 +0100
> From: Luca Nannipieri <nannipieri at pi.ingv.it>
> Subject: [torqueusers] priority queue
> To: Torque Users Mailing List <torqueusers at supercluster.org>
> Message-ID: <52711F0A.2070407 at pi.ingv.it>
> Content-Type: text/plain; charset=ISO-8859-15; format=flowed
>
> I have 2 queues:
>
> [root@ ~]# qstat -Q -f
> Queue: default
>     queue_type = Execution
>     Priority = 50
>     total_jobs = 1
>     state_count = Transit:0 Queued:1 Held:0 Waiting:0 Running:0
> Exiting:0
>     mtime = 1383139362
>     resources_assigned.nodect = 0
>     enabled = True
>     started = True
>
> Queue: batch
>     queue_type = Execution
>     Priority = 20
>     total_jobs = 1
>     state_count = Transit:0 Queued:0 Held:0 Waiting:0 Running:1
> Exiting:0
>     mtime = 1383139335
>     resources_assigned.nodect = 1
>     enabled = True
>     started = True
>
> default has priority= 50 and batch prioriry=20, but if i submit a job
> with default queue the scheduler put in "queued" status even if there is
>
> a running job with batch queue and not running job with default queue
> and put in "queued" the job with batch queue.  Why?
>
> -- 
> Ing. Luca Nannipieri
> Istituto Nazionale di Geofisica e Vulcanologia
> Sezione di Pisa
> Via della Faggiola, 32 - 56126 Pisa - Italy
> Tel. +39 050 8311926
> fax: +39 050 8311942
> http://www.pi.ingv.it/chisiamo/paginepersonali/nannipieri.html
> PEC: aoo.pisa at pec.ingv.it
> ----------------------------------------------------------------
>
> Il contenuto di questa e-mail e' rivolto unicamente alle persone
> cui e' indirizzato, e puo'contenere informazioni la cui riservatezza
> e' tutelata.E' proibita la copia, la divulgazione o l'uso di questo
> messaggio o dell'informazione ivi contenuta da chiunque altro che
> non sia il destinatario indicato. Se avete ricevuto questa e-mail
> per errore, vogliate cortesemente comunicarlo immediatamente per
> telefono, fax o e-mail.
> Grazie.
>
> This e-mail is intended only for person or entity to which is
> addressed and may contain information that is privileged, confidential
> or otherwise protected from disclosure. Copying, dissemination or use
> of this e-mail or the information herein by anyone other than the
> intended recipient is prohibited. If you have received this e-mail
> by mistake, please notify us immediately by telephone, fax or e-mail.
> Thank you.
>
>
>
>
>
> ------------------------------
>
> Message: 5
> Date: Wed, 30 Oct 2013 09:52:46 -0600
> From: David Beer <dbeer at adaptivecomputing.com>
> Subject: Re: [torqueusers] priority queue
> To: Torque Users Mailing List <torqueusers at supercluster.org>
> Message-ID:
>
> <CAFUQeZ1z-D9o23V69_bAue0svZ2dc7H0XsZjKh8zeWf=Opk0iA at mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Luca,
>
> The priority assigned by the queue is meant to be interpreted by the
> scheduler you are using. Usually, having two jobs where job1 has a
> priority
> of 20 and job2 has a priority of 50 means that both jobs are eligible to
> run, but job1 should be evaluated to run before job2 (or the other way
> if
> your scheduler things lower priority numbers run first or higher).
>
> In other words, the state of queued simply means the job is eligible to
> be
> run. Two jobs having the same state doesn't mean that they are equal
> priority for running.
>
> HTH
>
> David
>
>
> On Wed, Oct 30, 2013 at 9:00 AM, Luca Nannipieri
> <nannipieri at pi.ingv.it>wrote:
>
>> I have 2 queues:
>>
>> [root@ ~]# qstat -Q -f
>> Queue: default
>>      queue_type = Execution
>>      Priority = 50
>>      total_jobs = 1
>>      state_count = Transit:0 Queued:1 Held:0 Waiting:0 Running:0
> Exiting:0
>>      mtime = 1383139362
>>      resources_assigned.nodect = 0
>>      enabled = True
>>      started = True
>>
>> Queue: batch
>>      queue_type = Execution
>>      Priority = 20
>>      total_jobs = 1
>>      state_count = Transit:0 Queued:0 Held:0 Waiting:0 Running:1
> Exiting:0
>>      mtime = 1383139335
>>      resources_assigned.nodect = 1
>>      enabled = True
>>      started = True
>>
>> default has priority= 50 and batch prioriry=20, but if i submit a job
>> with default queue the scheduler put in "queued" status even if there
> is
>> a running job with batch queue and not running job with default queue
>> and put in "queued" the job with batch queue.  Why?
>>
>> --
>> Ing. Luca Nannipieri
>> Istituto Nazionale di Geofisica e Vulcanologia
>> Sezione di Pisa
>> Via della Faggiola, 32 - 56126 Pisa - Italy
>> Tel. +39 050 8311926
>> fax: +39 050 8311942
>> http://www.pi.ingv.it/chisiamo/paginepersonali/nannipieri.html
>> PEC: aoo.pisa at pec.ingv.it
>> ----------------------------------------------------------------
>>
>> Il contenuto di questa e-mail e' rivolto unicamente alle persone
>> cui e' indirizzato, e puo'contenere informazioni la cui riservatezza
>> e' tutelata.E' proibita la copia, la divulgazione o l'uso di questo
>> messaggio o dell'informazione ivi contenuta da chiunque altro che
>> non sia il destinatario indicato. Se avete ricevuto questa e-mail
>> per errore, vogliate cortesemente comunicarlo immediatamente per
>> telefono, fax o e-mail.
>> Grazie.
>>
>> This e-mail is intended only for person or entity to which is
>> addressed and may contain information that is privileged, confidential
>> or otherwise protected from disclosure. Copying, dissemination or use
>> of this e-mail or the information herein by anyone other than the
>> intended recipient is prohibited. If you have received this e-mail
>> by mistake, please notify us immediately by telephone, fax or e-mail.
>> Thank you.
>>
>>
>>
>> _______________________________________________
>> torqueusers mailing list
>> torqueusers at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/torqueusers
>>
>
>
>
> -- 
> David Beer | Senior Software Engineer
> Adaptive Computing
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
> http://www.supercluster.org/pipermail/torqueusers/attachments/20131030/b
> bf881b6/attachment-0001.html
>
> ------------------------------
>
> Message: 6
> Date: Wed, 30 Oct 2013 10:50:29 -0700 (PDT)
> From: Eva Hocks <hocks at sdsc.edu>
> Subject: [torqueusers] customizing xbpsmon
> To: Torque Users Mailing List <torqueusers at supercluster.org>
> Message-ID:
> 	<Pine.GSO.4.30.1310301038510.7397-100000 at multivac.sdsc.edu>
> Content-Type: TEXT/PLAIN; charset=US-ASCII
>
>
> Anybody using xpbsmon? I would like to change the size of the cluster
> frame.
>
> I changed the height in the xpbsmonrc without success.
>
> *nodeBoxFullMaxHeight:  1000
> *nodeBoxMirrorMaxHeight:        1000
> *serverBoxMaxHeight:    1000
> *siteBoxMaxHeight:      1000
>
>
> I also tried to chage the same variable in the xpbsmon script with the
> same result.
>
> Any help appreciated
> Thanks
> Eva
>
>
>
> ------------------------------
>
> Message: 7
> Date: Wed, 30 Oct 2013 14:48:29 -0400
> From: Kevin Van Workum <vanw at sabalcore.com>
> Subject: [torqueusers] TCL scheduler
> To: Torque Users Mailing List <torqueusers at supercluster.org>
> Message-ID:
>
> <CAHom8ysghfYWq9TB8Si-cHw2StToD=qJTdQ7PMztB_LFcr+6Mw at mail.gmail.com>
> Content-Type: text/plain; charset="us-ascii"
>
> I'm curious if the TCL scheduler still supported in 4.2.x? Trying to
> build
> it throws lots of errors.
>
> -- 
> Kevin Van Workum, PhD
> Sabalcore Computing Inc.
> "Where Data Becomes Discovery"
> http://www.sabalcore.com
> 877-492-8027 ext. 11
>
> -- 
>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
> http://www.supercluster.org/pipermail/torqueusers/attachments/20131030/2
> fe5a4db/attachment.html
>
> ------------------------------
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>
>
> End of torqueusers Digest, Vol 111, Issue 39
> ********************************************
>
>
> ------------------------------
>
> Message: 3
> Date: Thu, 31 Oct 2013 07:16:42 -0600
> From: Ricardo Rom?n Brenes <roman.ricardo at gmail.com>
> Subject: Re: [torqueusers] jobs terminated half way
> To: Torque Users Mailing List <torqueusers at supercluster.org>
> Message-ID:
>
> <CAG-vK_xOMvJGVFt5GASQuqddTe5kG8X0n4umciH1J_y_QDMDCw at mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Is the cluster yours? Can you run the program outside torque? It's the
> easiest way to know if it's torque or the program itself that aborted
> the
> task.
>
> Also can you print us your PBS_server configuration?
> On Oct 31, 2013 4:52 AM, "RB. Ezhilalan (Principal Physicist, CUH)" <
> RB.Ezhilalan at hse.ie> wrote:
>
>> Hi Ricardo,
>>
>> Thank you for looking at the log files. I noticed that the jobs get
>> terminated half way when the calculation time for each job is
> increased
>> (i.e number of histories). Could the default memory allocation be the
>> problem? I have used the default settings for the pbs_server. For your
>> info I am running BEAmnrc montecarlo simulations. Any suggestions?
>>
>> Regards,
>> Ezhil
>> Ezhilalan Ramalingam M.Sc.,DABR.,
>> Principal Physicist (Radiotherapy),
>> Medical Physics Department,
>> Cork University Hospital,
>> Wilton, Cork
>> Ireland
>> Tel. 00353 21 4922533
>> Fax.00353 21 4921300
>> Email: rb.ezhilalan at hse.ie
>> -----Original Message-----
>> From: torqueusers-bounces at supercluster.org
>> [mailto:torqueusers-bounces at supercluster.org] On Behalf Of
>> torqueusers-request at supercluster.org
>> Sent: 30 October 2013 18:59
>> To: torqueusers at supercluster.org
>> Subject: torqueusers Digest, Vol 111, Issue 39
>>
>> Send torqueusers mailing list submissions to
>>         torqueusers at supercluster.org
>>
>> To subscribe or unsubscribe via the World Wide Web, visit
>>         http://www.supercluster.org/mailman/listinfo/torqueusers
>> or, via email, send a message with subject or body 'help' to
>>         torqueusers-request at supercluster.org
>>
>> You can reach the person managing the list at
>>         torqueusers-owner at supercluster.org
>>
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of torqueusers digest..."
>>
>>
>> Today's Topics:
>>
>>    1. Re: jobs terminated half way (Ricardo Rom?n Brenes)
>>    2. Re: Problem building rpms torque-2.5.13 (Michael Jennings)
>>    3. Re: Problem building rpms torque-2.5.13 (Carles Acosta (PIC))
>>    4. priority queue (Luca Nannipieri)
>>    5. Re: priority queue (David Beer)
>>    6. customizing xbpsmon (Eva Hocks)
>>    7. TCL scheduler (Kevin Van Workum)
>>
>>
>> ----------------------------------------------------------------------
>>
>> Message: 1
>> Date: Tue, 29 Oct 2013 09:35:18 -0600
>> From: Ricardo Rom?n Brenes <roman.ricardo at gmail.com>
>> Subject: Re: [torqueusers] jobs terminated half way
>> To: Torque Users Mailing List <torqueusers at supercluster.org>
>> Message-ID:
>>
>> <CAG-vK_xU5vNFROLhgOxG8en=aK67eeEVZcJqVjkV7DyXOje9iQ at mail.gmail.com>
>> Content-Type: text/plain; charset="iso-8859-1"
>>
>> Hi again
>>
>> The only error i could read in the 6 logs you sent regarding those
> jobs
>> was
>> this:
>>
>> pbs_mom;Svr;pbs_mom;LOG_ERROR::Permission denied (13) in job_purge,
>> Unlink
>> of job file failed
>>
>> I am not sure if this is an actual error, just a error in the logging
> or
>> if
>> this "permission denied" should abort your jobs. Maybe check the
> workdir
>> permissions.
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> URL:
>>
> http://www.supercluster.org/pipermail/torqueusers/attachments/20131029/8
>> dfc388f/attachment-0001.html
>>
>> ------------------------------
>>
>> Message: 2
>> Date: Tue, 29 Oct 2013 09:54:17 -0700
>> From: Michael Jennings <mej at lbl.gov>
>> Subject: Re: [torqueusers] Problem building rpms torque-2.5.13
>> To: torqueusers at supercluster.org
>> Message-ID: <20131029165417.GA27774 at lbl.gov>
>> Content-Type: text/plain; charset=us-ascii
>>
>> On Tuesday, 29 October 2013, at 15:40:04 (+0100),
>> Carles Acosta wrote:
>>
>>> I am trying to build the rpms for the new torque 2.5.13 release.
>>> After applying the patch fix_mom_priv_2.5.patch, I use the following
>>> options:
>>>
>>> # rpmbuild -ta --with munge --with scp --define 'torque_home
>>> /var/spool/pbs' --define 'torque_server XXXXXXX' --define 'acflags
>>> --enable-maxdefault --with-readline --with-tcp-retry-limit=2
>>> --disable-spool' torque-2.5.13.tar.gz
>>>
>>> The process fails with the error:
>>
>> This is a known issue which has already been fixed in Git.  Here's the
>> mailing list thread from September:
>>
>>
> http://www.supercluster.org/pipermail/torquedev/2013-September/004587.ht
>> ml
>>
>> Here's the pull request (with patch):
>>
>> https://github.com/adaptivecomputing/torque/pull/183
>>
>> Michael
>>
>> --
>> Michael Jennings <mej at lbl.gov>
>> Senior HPC Systems Engineer
>> High-Performance Computing Services
>> Lawrence Berkeley National Laboratory
>> Bldg 50B-3209E        W: 510-495-2687
>> MS 050B-3209          F: 510-486-8615
>>
>>
>> ------------------------------
>>
>> Message: 3
>> Date: Tue, 29 Oct 2013 21:03:45 +0100
>> From: "Carles Acosta (PIC)" <cacosta at pic.es>
>> Subject: Re: [torqueusers] Problem building rpms torque-2.5.13
>> To: Torque Users Mailing List <torqueusers at supercluster.org>
>> Cc: "torqueusers at supercluster.org" <torqueusers at supercluster.org>
>> Message-ID: <AA390888-DD39-4BC8-805E-CFF6D2A61334 at pic.es>
>> Content-Type: text/plain;       charset=us-ascii
>>
>> Hi Michael,
>>
>> Thank you very much!
>>
>> Regards,
>>
>> Carles
>>
>> El Oct 29, 2013, a les 5:54 PM, Michael Jennings <mej at lbl.gov> va
>> escriure:
>>> On Tuesday, 29 October 2013, at 15:40:04 (+0100),
>>> Carles Acosta wrote:
>>>
>>>> I am trying to build the rpms for the new torque 2.5.13 release.
>>>> After applying the patch fix_mom_priv_2.5.patch, I use the
> following
>>>> options:
>>>>
>>>> # rpmbuild -ta --with munge --with scp --define 'torque_home
>>>> /var/spool/pbs' --define 'torque_server XXXXXXX' --define 'acflags
>>>> --enable-maxdefault --with-readline --with-tcp-retry-limit=2
>>>> --disable-spool' torque-2.5.13.tar.gz
>>>>
>>>> The process fails with the error:
>>>
>>> This is a known issue which has already been fixed in Git.  Here's
> the
>>> mailing list thread from September:
>>>
>>>
>>
> http://www.supercluster.org/pipermail/torquedev/2013-September/004587.ht
>> ml
>>>
>>> Here's the pull request (with patch):
>>>
>>> https://github.com/adaptivecomputing/torque/pull/183
>>>
>>> Michael
>>>
>>> --
>>> Michael Jennings <mej at lbl.gov>
>>> Senior HPC Systems Engineer
>>> High-Performance Computing Services
>>> Lawrence Berkeley National Laboratory
>>> Bldg 50B-3209E        W: 510-495-2687
>>> MS 050B-3209          F: 510-486-8615
>>> _______________________________________________
>>> torqueusers mailing list
>>> torqueusers at supercluster.org
>>> http://www.supercluster.org/mailman/listinfo/torqueusers
>>
>>
>> ------------------------------
>>
>> Message: 4
>> Date: Wed, 30 Oct 2013 16:00:26 +0100
>> From: Luca Nannipieri <nannipieri at pi.ingv.it>
>> Subject: [torqueusers] priority queue
>> To: Torque Users Mailing List <torqueusers at supercluster.org>
>> Message-ID: <52711F0A.2070407 at pi.ingv.it>
>> Content-Type: text/plain; charset=ISO-8859-15; format=flowed
>>
>> I have 2 queues:
>>
>> [root@ ~]# qstat -Q -f
>> Queue: default
>>      queue_type = Execution
>>      Priority = 50
>>      total_jobs = 1
>>      state_count = Transit:0 Queued:1 Held:0 Waiting:0 Running:0
>> Exiting:0
>>      mtime = 1383139362
>>      resources_assigned.nodect = 0
>>      enabled = True
>>      started = True
>>
>> Queue: batch
>>      queue_type = Execution
>>      Priority = 20
>>      total_jobs = 1
>>      state_count = Transit:0 Queued:0 Held:0 Waiting:0 Running:1
>> Exiting:0
>>      mtime = 1383139335
>>      resources_assigned.nodect = 1
>>      enabled = True
>>      started = True
>>
>> default has priority= 50 and batch prioriry=20, but if i submit a job
>> with default queue the scheduler put in "queued" status even if there
> is
>>
>> a running job with batch queue and not running job with default queue
>> and put in "queued" the job with batch queue.  Why?
>>
>> --
>> Ing. Luca Nannipieri
>> Istituto Nazionale di Geofisica e Vulcanologia
>> Sezione di Pisa
>> Via della Faggiola, 32 - 56126 Pisa - Italy
>> Tel. +39 050 8311926
>> fax: +39 050 8311942
>> http://www.pi.ingv.it/chisiamo/paginepersonali/nannipieri.html
>> PEC: aoo.pisa at pec.ingv.it
>> ----------------------------------------------------------------
>>
>> Il contenuto di questa e-mail e' rivolto unicamente alle persone
>> cui e' indirizzato, e puo'contenere informazioni la cui riservatezza
>> e' tutelata.E' proibita la copia, la divulgazione o l'uso di questo
>> messaggio o dell'informazione ivi contenuta da chiunque altro che
>> non sia il destinatario indicato. Se avete ricevuto questa e-mail
>> per errore, vogliate cortesemente comunicarlo immediatamente per
>> telefono, fax o e-mail.
>> Grazie.
>>
>> This e-mail is intended only for person or entity to which is
>> addressed and may contain information that is privileged, confidential
>> or otherwise protected from disclosure. Copying, dissemination or use
>> of this e-mail or the information herein by anyone other than the
>> intended recipient is prohibited. If you have received this e-mail
>> by mistake, please notify us immediately by telephone, fax or e-mail.
>> Thank you.
>>
>>
>>
>>
>>
>> ------------------------------
>>
>> Message: 5
>> Date: Wed, 30 Oct 2013 09:52:46 -0600
>> From: David Beer <dbeer at adaptivecomputing.com>
>> Subject: Re: [torqueusers] priority queue
>> To: Torque Users Mailing List <torqueusers at supercluster.org>
>> Message-ID:
>>
>> <CAFUQeZ1z-D9o23V69_bAue0svZ2dc7H0XsZjKh8zeWf=Opk0iA at mail.gmail.com>
>> Content-Type: text/plain; charset="iso-8859-1"
>>
>> Luca,
>>
>> The priority assigned by the queue is meant to be interpreted by the
>> scheduler you are using. Usually, having two jobs where job1 has a
>> priority
>> of 20 and job2 has a priority of 50 means that both jobs are eligible
> to
>> run, but job1 should be evaluated to run before job2 (or the other way
>> if
>> your scheduler things lower priority numbers run first or higher).
>>
>> In other words, the state of queued simply means the job is eligible
> to
>> be
>> run. Two jobs having the same state doesn't mean that they are equal
>> priority for running.
>>
>> HTH
>>
>> David
>>
>>
>> On Wed, Oct 30, 2013 at 9:00 AM, Luca Nannipieri
>> <nannipieri at pi.ingv.it>wrote:
>>
>>> I have 2 queues:
>>>
>>> [root@ ~]# qstat -Q -f
>>> Queue: default
>>>      queue_type = Execution
>>>      Priority = 50
>>>      total_jobs = 1
>>>      state_count = Transit:0 Queued:1 Held:0 Waiting:0 Running:0
>> Exiting:0
>>>      mtime = 1383139362
>>>      resources_assigned.nodect = 0
>>>      enabled = True
>>>      started = True
>>>
>>> Queue: batch
>>>      queue_type = Execution
>>>      Priority = 20
>>>      total_jobs = 1
>>>      state_count = Transit:0 Queued:0 Held:0 Waiting:0 Running:1
>> Exiting:0
>>>      mtime = 1383139335
>>>      resources_assigned.nodect = 1
>>>      enabled = True
>>>      started = True
>>>
>>> default has priority= 50 and batch prioriry=20, but if i submit a
> job
>>> with default queue the scheduler put in "queued" status even if
> there
>> is
>>> a running job with batch queue and not running job with default
> queue
>>> and put in "queued" the job with batch queue.  Why?
>>>
>>> --
>>> Ing. Luca Nannipieri
>>> Istituto Nazionale di Geofisica e Vulcanologia
>>> Sezione di Pisa
>>> Via della Faggiola, 32 - 56126 Pisa - Italy
>>> Tel. +39 050 8311926
>>> fax: +39 050 8311942
>>> http://www.pi.ingv.it/chisiamo/paginepersonali/nannipieri.html
>>> PEC: aoo.pisa at pec.ingv.it
>>> ----------------------------------------------------------------
>>>
>>> Il contenuto di questa e-mail e' rivolto unicamente alle persone
>>> cui e' indirizzato, e puo'contenere informazioni la cui riservatezza
>>> e' tutelata.E' proibita la copia, la divulgazione o l'uso di questo
>>> messaggio o dell'informazione ivi contenuta da chiunque altro che
>>> non sia il destinatario indicato. Se avete ricevuto questa e-mail
>>> per errore, vogliate cortesemente comunicarlo immediatamente per
>>> telefono, fax o e-mail.
>>> Grazie.
>>>
>>> This e-mail is intended only for person or entity to which is
>>> addressed and may contain information that is privileged,
> confidential
>>> or otherwise protected from disclosure. Copying, dissemination or
> use
>>> of this e-mail or the information herein by anyone other than the
>>> intended recipient is prohibited. If you have received this e-mail
>>> by mistake, please notify us immediately by telephone, fax or
> e-mail.
>>> Thank you.
>>>
>>>
>>>
>>> _______________________________________________
>>> torqueusers mailing list
>>> torqueusers at supercluster.org
>>> http://www.supercluster.org/mailman/listinfo/torqueusers
>>>
>>
>>
>>
>> --
>> David Beer | Senior Software Engineer
>> Adaptive Computing
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> URL:
>>
> http://www.supercluster.org/pipermail/torqueusers/attachments/20131030/b
>> bf881b6/attachment-0001.html
>>
>> ------------------------------
>>
>> Message: 6
>> Date: Wed, 30 Oct 2013 10:50:29 -0700 (PDT)
>> From: Eva Hocks <hocks at sdsc.edu>
>> Subject: [torqueusers] customizing xbpsmon
>> To: Torque Users Mailing List <torqueusers at supercluster.org>
>> Message-ID:
>>         <Pine.GSO.4.30.1310301038510.7397-100000 at multivac.sdsc.edu>
>> Content-Type: TEXT/PLAIN; charset=US-ASCII
>>
>>
>> Anybody using xpbsmon? I would like to change the size of the cluster
>> frame.
>>
>> I changed the height in the xpbsmonrc without success.
>>
>> *nodeBoxFullMaxHeight:  1000
>> *nodeBoxMirrorMaxHeight:        1000
>> *serverBoxMaxHeight:    1000
>> *siteBoxMaxHeight:      1000
>>
>>
>> I also tried to chage the same variable in the xpbsmon script with the
>> same result.
>>
>> Any help appreciated
>> Thanks
>> Eva
>>
>>
>>
>> ------------------------------
>>
>> Message: 7
>> Date: Wed, 30 Oct 2013 14:48:29 -0400
>> From: Kevin Van Workum <vanw at sabalcore.com>
>> Subject: [torqueusers] TCL scheduler
>> To: Torque Users Mailing List <torqueusers at supercluster.org>
>> Message-ID:
>>
>> <CAHom8ysghfYWq9TB8Si-cHw2StToD=qJTdQ7PMztB_LFcr+6Mw at mail.gmail.com>
>> Content-Type: text/plain; charset="us-ascii"
>>
>> I'm curious if the TCL scheduler still supported in 4.2.x? Trying to
>> build
>> it throws lots of errors.
>>
>> --
>> Kevin Van Workum, PhD
>> Sabalcore Computing Inc.
>> "Where Data Becomes Discovery"
>> http://www.sabalcore.com
>> 877-492-8027 ext. 11
>>
>> --
>>
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> URL:
>>
> http://www.supercluster.org/pipermail/torqueusers/attachments/20131030/2
>> fe5a4db/attachment.html
>>
>> ------------------------------
>>
>> _______________________________________________
>> torqueusers mailing list
>> torqueusers at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/torqueusers
>>
>>
>> End of torqueusers Digest, Vol 111, Issue 39
>> ********************************************
>> _______________________________________________
>> torqueusers mailing list
>> torqueusers at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/torqueusers
>>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
> http://www.supercluster.org/pipermail/torqueusers/attachments/20131031/8
> d340e0f/attachment.html
>
> ------------------------------
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>
>
> End of torqueusers Digest, Vol 111, Issue 40
> ********************************************
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>

James E. Prewett                    Jim at Prewett.org download at hpc.unm.edu
Systems Team Leader           LoGS: http://www.hpc.unm.edu/~download/LoGS/
Designated Security Officer         OpenPGP key: pub 1024D/31816D93
HPC Systems Engineer III   UNM HPC  505.277.8210


More information about the torqueusers mailing list