From jayavant.patil82 at gmail.com Thu Feb 2 23:49:03 2012 From: jayavant.patil82 at gmail.com (Jayavant Patil) Date: Fri, 3 Feb 2012 12:19:03 +0530 Subject: [Mauiusers] Queue to Node Mapping Message-ID: Hi, I am using Torque 3.0.0 and Maui 3.3. I want the jobs submitted to a specific queue should run only on some allocated nodes to that queue (i.e. queue to node mapping). Does anybody know how to do this? -- Thanks & Regards, Jayavant Ningoji Patil Engineer: System Software Computational Research Laboratories Ltd. Pune-411 004. Maharashtra, India. +91 9923536030. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.supercluster.org/pipermail/mauiusers/attachments/20120203/f0a4085d/attachment.html From jayavant.patil82 at gmail.com Fri Feb 3 01:18:24 2012 From: jayavant.patil82 at gmail.com (Jayavant Patil) Date: Fri, 3 Feb 2012 13:48:24 +0530 Subject: [Mauiusers] Queue to Node Mapping In-Reply-To: <2d7330a.22258.135420519a4.Coremail.zgp121@126.com> References: <2d7330a.22258.135420519a4.Coremail.zgp121@126.com> Message-ID: On Fri, Feb 3, 2012 at 12:31 PM, Guangping Zhang wrote: > ** > I will give you one example as follows, as far as I know this works in > Torque 2.4.6+maui 3.3.1 > > 1. edit the file /var/spool/torque/server_priv/nodes > > node01 np=12 sugon siesta dalton gaussian > node02 np=12 sugon siesta dalton gaussian > node03 np=12 sugon siesta dalton gaussian > node04 np=12 sugon siesta dalton gaussian > node05 np=12 sugon siesta dalton gaussian > node06 np=12 sugon siesta dalton gaussian > node07 np=12 sugon siesta dalton gaussian > node08 np=12 sugon siesta dalton gaussian > node09 np=12 sugon siesta dalton gaussian > node10 np=12 sugon siesta dalton gaussian > node11 np=12 sugon siesta dalton gaussian > node12 np=12 sugon siesta dalton gaussian > node31 np=8 powerlead siesta dalton gaussian others > node32 np=8 powerlead siesta dalton gaussian others > node33 np=8 powerlead siesta dalton gaussian others > node34 np=8 powerlead siesta dalton gaussian others > node35 np=8 powerlead siesta dalton gaussian others > node36 np=8 powerlead siesta dalton gaussian others > node38 np=8 powerlead siesta dalton gaussian others > node39 np=8 powerlead siesta dalton gaussian others > node40 np=8 powerlead siesta dalton gaussian others > node41 np=8 dell siesta dalton gaussian others > node42 np=8 dell siesta dalton gaussian others > node43 np=8 dell siesta dalton gaussian others > node44 np=8 dell molpro > node45 np=8 dell molpro > node46 np=8 dell molpro > 2.create a queue named SIESTA > qmgr -c "create queue SIESTA queue_type=execution" > qmgr -c "set queue SIESTA started=true" > qmgr -c "set queue SIESTA enabled=true" > qmgr -c "set queue SIESTA acl_group_enable=true" > qmgr -c "set queue SIESTA acl_groups=siesta" > qmgr -c "set queue SIESTA acl_group_sloppy=true" > qmgr -c "set queue SIESTA resources_default.neednodes=siesta" > 3.restart service > > qterm -t quick > pbs_server > ps -A |grep maui > 18066 ? 00:00:00 maui > kill 18066 > /usr/local/software/maui-3.3.1/sbin/maui > > 4. That is Ok > > A user that belong to group siesta only can submit jobs to queue SIESTA > and can only use the nodes which has property "siesta" > > > Best > > 2012-02-03 > ------------------------------ > Guangping Zhang > ------------------------------ > *????*Jayavant Patil > *?????*2012-02-03 14:49 > *???*[Mauiusers] Queue to Node Mapping > *????*torquedev,mauiusers > *???* > > Hi, > > I am using Torque 3.0.0 and Maui 3.3. I want the jobs submitted to a > specific queue should run only on some allocated nodes to that queue (i.e. > queue to node mapping). > > > Does anybody know how to do this? > > -- > > Thanks & Regards, > Jayavant Ningoji Patil > Engineer: System Software > Computational Research Laboratories Ltd. > Pune-411 004. > Maharashtra, India. > +91 9923536030. > > Hi Guangping Zhang, Thanks a lot. It works. Just for curiosity, is this the only way to achieve the required? -- Thanks & Regards, Jayavant Ningoji Patil Engineer: System Software Computational Research Laboratories Ltd. Pune-411 004. Maharashtra, India. +91 9923536030. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.supercluster.org/pipermail/mauiusers/attachments/20120203/88352a0a/attachment.html From Gareth.Williams at csiro.au Fri Feb 3 01:44:05 2012 From: Gareth.Williams at csiro.au (Gareth.Williams at csiro.au) Date: Fri, 3 Feb 2012 19:44:05 +1100 Subject: [Mauiusers] [torquedev] Queue to Node Mapping In-Reply-To: References: <2d7330a.22258.135420519a4.Coremail.zgp121@126.com> Message-ID: <007DECE986B47F4EABF823C1FBB19C620102CDD74DD0@exvic-mbx04.nexus.csiro.au> >?? Just for curiosity, is this the only way to achieve the required? We use a different setup where all the configuration is in moab (it will probably work with maui too - it would be interesting to know). Features are used decide which nodes map to which queues - but the features are from moab's perspective rather than torque's. A similar acl setup is needed to restrict access to the queues. The moab part looks like: ## for io jobs in io queue CLASSCFG[io] DEFAULT.FEATURES=io SRCFG[io] NODEFEATURES=io SRCFG[io] PERIOD=WEEK DEPTH=1 SRCFG[io] CLASSLIST=io SRCFG[io] HOSTLIST=. SRCFG[io] FLAGS=ACLOVERLAP,IGNSTATE NODECFG[ionode01] FEATURES+=io Gareth From david at unistra.fr Fri Feb 3 02:20:24 2012 From: david at unistra.fr (R. David) Date: Fri, 3 Feb 2012 10:20:24 +0100 Subject: [Mauiusers] [torqueusers] [Patch] GPUs by the way of GRES In-Reply-To: <20120203095810.6ba1833b@RunningPenguin.chalmion.homelinux.net> References: <20120203095810.6ba1833b@RunningPenguin.chalmion.homelinux.net> Message-ID: <9DB98485-EECB-48D1-8AEC-5F0877E6704D@unistra.fr> Hello, Here at the Computing center of the University of Strasbourg, we have been using this patch with great success. It makes GPU access much easier for our users, and our batch configuration is now fully operational for GPUs. Regards, Le 3 f?vr. 2012 ? 09:58, Jonathan Michalon a ?crit : > Hi Maui folks, > > GPUs in Maui are a long standing problem. Last year a patch was sent by Mariusz > Mamo?ski [1], which works based on GRES parameters. > I've just made GPUs kind of working, by enhancing that patch. Please find > attached the resulting patch, which works well for Maui 3.3.1. > It defines a special GRES named "gpu" which works as expected on my test cases. > > Note that GRES behaviour seems quite confused as sometimes they are mentioned > as consumable. This patch annihilates this behaviour, for the needs of GPUs. > > To use the patch: > get the sources of maui-3.3.1 and patch them: > patch -p1 < ../Patch-for-gpu-GRES.patch > then compile as usual. > > You have to configure the GPUs in maui.cfg: > NODECFG[nodename] GRES=gpu:2 > > Then when queuing jobs you can request GPUs with (Torque syntax): > qsub -W x=GRES:gpu at 1 > > I hope this helps, please test this and enhance to your needs! > > [1] > http://www.supercluster.org/pipermail/mauiusers/2011-April/004622.html > > Regards, > > PS. This is the second attempt to send the mail? > > -- > Jonathan Michalon > IT student in Strasbourg > _______________________________________________ > torqueusers mailing list > torqueusers at supercluster.org > http://www.supercluster.org/mailman/listinfo/torqueusers --------------------------------------------------------- R. David - david at unistra.fr Responsable du meso-centre UdS / Direction Informatique Tel. : 03 68 85 45 48 --------------------------------------------------------- From paul.szczypka at gmail.com Fri Feb 3 03:46:15 2012 From: paul.szczypka at gmail.com (paul.szczypka at gmail.com) Date: Fri, 03 Feb 2012 10:46:15 +0000 Subject: [Mauiusers] Queue to Node Mapping In-Reply-To: Message-ID: No it's not. I find the most simple method is to actually have the queues know which nodes they can run on. In my opinion this is much better than the previous solution since you edit the queue rather than the nodes. Fro each queue (using qmgr if you want) set the parameter acl_hosts_enable to False then simply add the nodes you want to the queue like so: #!/bin/bash qmgr -c "set queue batch acl_host_enable=false" # clear current host list qmgr -c "set queue batch acl_hosts=-" # add hosts qmgr -c "set queue batch acl_hosts+=nodeA" qmgr -c "set queue batch acl_hosts+=nodeB" qmgr -c "set queue batch acl_hosts+=nodeC" ... qmgr -c "set queue batch acl_hosts+=nodeZ" # shutdown maui after last scheduling iteration schedctl -k # restart maui maui See information here: http://www.adaptivecomputing.com/resources/docs/mwm/6-0/12.1nodelocation.php#open http://www.clusterresources.com/torquedocs/4.1queueconfig.shtml#acl_host_enable Note that the information in the Moab guide works in this case for maui/torque. P On , Jayavant Patil wrote: > On Fri, Feb 3, 2012 at 12:31 PM, Guangping Zhang zgp121 at 126.com> wrote: > I will give you one > example as follows, as far as I know this works in Torque 2.4.6+maui > 3.3.1 > 1. edit the file /var/spool/torque/server_priv/nodes > node01 np=12 > sugon siesta dalton gaussian > node02 np=12 sugon > siesta dalton gaussian > node03 np=12 > sugon siesta dalton gaussian > node04 np=12 sugon > siesta dalton gaussian > node05 np=12 > sugon siesta dalton gaussian > node06 np=12 sugon > siesta dalton gaussian > node07 np=12 > sugon siesta dalton gaussian > node08 np=12 sugon > siesta dalton gaussian > node09 np=12 > sugon siesta dalton gaussian > node10 np=12 sugon > siesta dalton gaussian > node11 np=12 > sugon siesta dalton gaussian > node12 np=12 sugon > siesta dalton gaussian > node31 np=8 > powerlead siesta dalton gaussian > others > node32 np=8 > powerlead siesta dalton gaussian > others > node33 np=8 > powerlead siesta dalton gaussian > others > node34 np=8 > powerlead siesta dalton gaussian > others > node35 np=8 > powerlead siesta dalton gaussian > others > node36 np=8 > powerlead siesta dalton gaussian > others > node38 np=8 > powerlead siesta dalton gaussian > others > node39 np=8 > powerlead siesta dalton gaussian > others > node40 np=8 > powerlead siesta dalton gaussian > others > node41 np=8 dell > siesta dalton gaussian > others > node42 np=8 dell > siesta dalton gaussian > others > node43 np=8 dell > siesta dalton gaussian > others > node44 np=8 dell > molpro > node45 np=8 > dell > molpro > node46 np=8 dell > molpro > 2.create a queue named SIESTA > qmgr -c "create queue SIESTA > queue_type=execution" > qmgr -c "set queue SIESTA started=true" > qmgr -c "set > queue SIESTA enabled=true" > qmgr -c "set queue SIESTA > acl_group_enable=true" > qmgr -c "set queue SIESTA acl_groups=siesta" > qmgr > -c "set queue SIESTA acl_group_sloppy=true" > qmgr -c "set queue SIESTA > resources_default.neednodes=siesta" > 3.restart service > qterm -t quick > pbs_server > ps -A |grep maui > 18066 ? > 00:00:00 maui > kill > 18066 > /usr/local/software/maui-3.3.1/sbin/maui > 4. That is Ok > A user that belong to group siesta only can submit jobs to queue SIESTA > and > can only use the nodes which has property "siesta" > Best > 2012-02-03 > Guangping Zhang > ????Jayavant Patil > ?????2012-02-03 14:49 > ???[Mauiusers] Queue to Node > Mapping > ????torquedev,mauiusers > ??? > Hi, > I am using Torque 3.0.0 and Maui 3.3. I want > the jobs submitted to a specific queue should run only on some allocated > nodes > to that queue (ie queue to node mapping). > Does anybody know how to do this? > -- > Thanks & > Regards, > Jayavant Ningoji Patil > Engineer: System Software > Computational > Research Laboratories Ltd. > Pune-411 004. > Maharashtra, India. > +91 > 9923536030. > Hi Guangping Zhang, > Thanks a lot. It works. > Just for curiosity, is this the only way to achieve the required? > -- > Thanks & Regards, > Jayavant Ningoji Patil > Engineer: System Software > Computational Research Laboratories Ltd. > Pune-411 004. > Maharashtra, India. > +91 9923536030. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.supercluster.org/pipermail/mauiusers/attachments/20120203/0b4e95b5/attachment-0001.html From jasonw at Jhu.edu Fri Feb 3 07:24:17 2012 From: jasonw at Jhu.edu (Jason Williams) Date: Fri, 03 Feb 2012 09:24:17 -0500 Subject: [Mauiusers] [torqueusers] [Patch] GPUs by the way of GRES In-Reply-To: <9DB98485-EECB-48D1-8AEC-5F0877E6704D@unistra.fr> References: <20120203095810.6ba1833b@RunningPenguin.chalmion.homelinux.net> <9DB98485-EECB-48D1-8AEC-5F0877E6704D@unistra.fr> Message-ID: <4F2BEE11.3090709@Jhu.edu> Thanks. I hadn't seen this, I think because the original poster seems to have sent his modified patch to the torqueuesers list, which I don't subscribe to due to volume. I might actually have to give this a shot, as my users have been complaining about the very kludgey hacked-together way I am using torque/maui to schedule things. If it works as good as you say, and isn't too invasive, I might committ it to the Maui SVN too. Just for kicks... -- Jason On 2/3/2012 4:20 AM, R. David wrote: > Hello, > > Here at the Computing center of the University of Strasbourg, we have been using this patch with great success. > > It makes GPU access much easier for our users, and our batch configuration is now fully operational for GPUs. > > Regards, > > > Le 3 f?vr. 2012 ? 09:58, Jonathan Michalon a ?crit : > >> Hi Maui folks, >> >> GPUs in Maui are a long standing problem. Last year a patch was sent by Mariusz >> Mamo?ski [1], which works based on GRES parameters. >> I've just made GPUs kind of working, by enhancing that patch. Please find >> attached the resulting patch, which works well for Maui 3.3.1. >> It defines a special GRES named "gpu" which works as expected on my test cases. >> >> Note that GRES behaviour seems quite confused as sometimes they are mentioned >> as consumable. This patch annihilates this behaviour, for the needs of GPUs. >> >> To use the patch: >> get the sources of maui-3.3.1 and patch them: >> patch -p1< ../Patch-for-gpu-GRES.patch >> then compile as usual. >> >> You have to configure the GPUs in maui.cfg: >> NODECFG[nodename] GRES=gpu:2 >> >> Then when queuing jobs you can request GPUs with (Torque syntax): >> qsub -W x=GRES:gpu at 1 >> >> I hope this helps, please test this and enhance to your needs! >> >> [1] >> http://www.supercluster.org/pipermail/mauiusers/2011-April/004622.html >> >> Regards, >> >> PS. This is the second attempt to send the mail? >> >> -- >> Jonathan Michalon >> IT student in Strasbourg >> _______________________________________________ >> torqueusers mailing list >> torqueusers at supercluster.org >> http://www.supercluster.org/mailman/listinfo/torqueusers > --------------------------------------------------------- > R. David - david at unistra.fr > Responsable du meso-centre > UdS / Direction Informatique > Tel. : 03 68 85 45 48 > --------------------------------------------------------- > > > > _______________________________________________ > mauiusers mailing list > mauiusers at supercluster.org > http://www.supercluster.org/mailman/listinfo/mauiusers From jayavant.patil82 at gmail.com Wed Feb 8 02:15:39 2012 From: jayavant.patil82 at gmail.com (Jayavant Patil) Date: Wed, 8 Feb 2012 14:45:39 +0530 Subject: [Mauiusers] NODEDOWNSTATEDELAYTIME Usage and Testing Message-ID: Hi, I am using Torque 3.0.0 and Maui 3.3. I want to know NODEDOWNSTATEDELAYTIME parameter usage ( I am not clear from admin guide information) and how to test that it is working. Is this parameter related to reserved jobs only? -- Thanks & Regards, Jayavant Ningoji Patil -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.supercluster.org/pipermail/mauiusers/attachments/20120208/e31697f9/attachment.html From lance at quantumbioinc.com Tue Feb 14 11:13:51 2012 From: lance at quantumbioinc.com (Lance Westerhoff) Date: Tue, 14 Feb 2012 13:13:51 -0500 Subject: [Mauiusers] procs= not working as documented In-Reply-To: <263D778C-C50E-4681-AE3B-112FE998735F@quantumbioinc.com> References: <5BE7BF30-0EAF-4DF0-B4E2-DECA32FEB99C@quantumbioinc.com> <263D778C-C50E-4681-AE3B-112FE998735F@quantumbioinc.com> Message-ID: <93B136F6-91A2-45A7-ABF7-4B4E09B15E09@quantumbioinc.com> Hello All- (I apologize if you receive this email twice. I'm unsure whether it is a problem in torque, maui, or both and therefore I also posted it to the torque list). We're still having trouble with this feature, and we are starting to shop around for a torque/maui replacement in order to be able to use it. Before we do that however, I wanted to see if anyone has any thoughts on how to address the problem within torque/maui. Perhaps I simply don't understand the feature. The versions of torque and maui we are using are: torque-3.0.2 maui-3.2.6p21 Yes, we have tried newer versions of maui, but then the option doesn't work at all. Here is the scenario (I also included the conversation from November below for more information). Conceptually, our software is almost infinitely scalable in the sense that there is very little overhead associated with interprocess communication. Therefore, we do not require that all of the processes reside on a small number of nodes. In fact, we can stretch the processors to any and all nodes in the cluster with ~zero loss in performance. So we can literally have one node that has a single process running and another node that has 8 processes running. Since we have that level of scalability, we don't want to have to lock ourselves into having to request resources using the "nodes=X:ppn=Y" style since this style requires that nodes open up or drain in order to use them. Since our users have a big mixture of single and multi-processor jobs, waiting for node drain can really waste a lot of resources. I saw the "procs=#" the Requesting Resources table (see http://www.clusterresources.com/torquedocs/2.1jobsubmission.shtml#resources for more). It *appears* that this option should be able to allow the user to request simply X*Y processors and the scheduler should be able to schedule them any way it can fit. So using the following #PBS note, we should be able to request 40 processors: #PBS -l procs=40 Instead, we see that the scheduler seems to take this information, read it, and basically disregard it. The reason I know it reads it is because if I ask for say 40 processors and 40 processors are available in the cluster, it works as expected and all is right with the world. Where it gets a bit more choppy is when I ask for 40 processors and only 1 processor is available. The job doesn't wait in the queue for the remaining 39 processors to open up, and instead PBS simply just starts the job on that processor. I can't see how that is anything but a bug. If the user is asking for 40 processors, why isn't the scheduler waiting for all 40 processors to open up? If answering this question will require additional information, please ask. We are at our wits end here. Thanks! -Lance On Nov 18, 2011, at 9:39 AM, Lance Westerhoff wrote: > > Hello All- > > I submitted the following to the torque list, but the more I look at it, the more I think it might be a scheduler problem. It appears that when running with the following specs, the procs= option does not actually work as expected. > > ========================================== > > #PBS -S /bin/bash > #PBS -l procs=60 > #PBS -l pmem=700mb > #PBS -l walltime=744:00:00 > #PBS -j oe > #PBS -q batch > > torque version: tried 3.0.2. in v2.5.4, I think the procs option worked as documented > maui version: 3.2.6p21 (also tried maui 3.3.1 but it is a complete fail in terms of the procs option and it only asks for a single CPU) > > ========================================== > > If there are fewer then 60 processors available in the cluster (in this case there were 53 available) the job will go in an take whatever processors are remaining instead of waiting for all 60 processors to free up. Any thoughts as to why this might be happening? Sometimes it doesn't really matter and 53 would be almost as good as 60, however if only 2 processors are available and the user asks for 60, I would hate for him to go in. > > Thank you for your time! > > -Lance > From alexfs04 at gmail.com Wed Feb 1 05:15:52 2012 From: alexfs04 at gmail.com (AlexF) Date: Wed, 1 Feb 2012 21:15:52 +0900 Subject: [Mauiusers] sample resource and workload trace files? Message-ID: Hello, I've been looking for sample resource or workload trace files, but have been unsuccessful. The manual points to $(MAUIHOMEDIR)/traces for samples, but this directory is empty. I've downloaded 3.3, 3,2, as well as a checkout of the latest from the svn repo, and have not been able to locate any sample resource and workload traces in recent distributions. Older online resources point to www.clusterresources.com, as well as http://www.supercluster.org/research/traces/ but these links appear to be outdated (404). Could somebody please point me to any publicly available sample resource and workload trace files? Are such traces still available anywhere? thanks, Alex F. From zgp121 at 126.com Fri Feb 3 00:01:38 2012 From: zgp121 at 126.com (Guangping Zhang) Date: Fri, 3 Feb 2012 15:01:38 +0800 Subject: [Mauiusers] Queue to Node Mapping In-Reply-To: References: Message-ID: <2d7330a.22258.135420519a4.Coremail.zgp121@126.com> I will give you one example as follows, as far as I know this works in Torque 2.4.6+maui 3.3.1 1. edit the file /var/spool/torque/server_priv/nodes node01 np=12 sugon siesta dalton gaussian node02 np=12 sugon siesta dalton gaussian node03 np=12 sugon siesta dalton gaussian node04 np=12 sugon siesta dalton gaussian node05 np=12 sugon siesta dalton gaussian node06 np=12 sugon siesta dalton gaussian node07 np=12 sugon siesta dalton gaussian node08 np=12 sugon siesta dalton gaussian node09 np=12 sugon siesta dalton gaussian node10 np=12 sugon siesta dalton gaussian node11 np=12 sugon siesta dalton gaussian node12 np=12 sugon siesta dalton gaussian node31 np=8 powerlead siesta dalton gaussian others node32 np=8 powerlead siesta dalton gaussian others node33 np=8 powerlead siesta dalton gaussian others node34 np=8 powerlead siesta dalton gaussian others node35 np=8 powerlead siesta dalton gaussian others node36 np=8 powerlead siesta dalton gaussian others node38 np=8 powerlead siesta dalton gaussian others node39 np=8 powerlead siesta dalton gaussian others node40 np=8 powerlead siesta dalton gaussian others node41 np=8 dell siesta dalton gaussian others node42 np=8 dell siesta dalton gaussian others node43 np=8 dell siesta dalton gaussian others node44 np=8 dell molpro node45 np=8 dell molpro node46 np=8 dell molpro 2.create a queue named SIESTA qmgr -c "create queue SIESTA queue_type=execution" qmgr -c "set queue SIESTA started=true" qmgr -c "set queue SIESTA enabled=true" qmgr -c "set queue SIESTA acl_group_enable=true" qmgr -c "set queue SIESTA acl_groups=siesta" qmgr -c "set queue SIESTA acl_group_sloppy=true" qmgr -c "set queue SIESTA resources_default.neednodes=siesta" 3.restart service qterm -t quick pbs_server ps -A |grep maui 18066 ? 00:00:00 maui kill 18066 /usr/local/software/maui-3.3.1/sbin/maui 4. That is Ok A user that belong to group siesta only can submit jobs to queue SIESTA and can only use the nodes which has property "siesta" Best 2012-02-03 Guangping Zhang ????Jayavant Patil ?????2012-02-03 14:49 ???[Mauiusers] Queue to Node Mapping ????torquedev,mauiusers ??? Hi, I am using Torque 3.0.0 and Maui 3.3. I want the jobs submitted to a specific queue should run only on some allocated nodes to that queue (i.e. queue to node mapping). Does anybody know how to do this? -- Thanks & Regards, Jayavant Ningoji Patil Engineer: System Software Computational Research Laboratories Ltd. Pune-411 004. Maharashtra, India. +91 9923536030. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.supercluster.org/pipermail/mauiusers/attachments/20120203/956cc3d9/attachment-0001.html From jascha.wang at gmail.com Tue Feb 7 02:37:34 2012 From: jascha.wang at gmail.com (Xiangqian Wang) Date: Tue, 7 Feb 2012 17:37:34 +0800 Subject: [Mauiusers] queue to node mapping is wrong when use '-l procs' option In-Reply-To: References: Message-ID: I failed to test queue to node mapping feature of torque/maui system, I use torque 2.5.8 and maui 3.2.6p21. the simple job script contains a procs option: #!/bin/sh #PBS -N simple-job #PBS -l procs=3 #PBS -q fluent #PBS -d /opt/share/job cd $PBS_O_WORKDIR date sleep 30 date The 'fluent' queue is mapped to a node 'cnode01' with 4 processors, the setting is shown below: # Create and define queue batch # create queue batch set queue batch queue_type = Execution set queue batch resources_default.nodes = 1 set queue batch resources_default.walltime = 01:00:00 set queue batch enabled = True set queue batch started = True # # Create and define queue fluent # create queue fluent set queue fluent queue_type = Execution set queue fluent acl_host_enable = False set queue fluent acl_hosts = cnode01 set queue fluent enabled = True set queue fluent started = True # # Set server attributes. # set server scheduling = True set server acl_hosts = snode01 set server acl_roots = root@* set server managers = root at snode01 set server operators = root at snode01 set server default_queue = batch set server log_events = 511 set server mail_from = adm set server scheduler_iteration = 600 set server node_check_rate = 150 set server tcp_timeout = 6 set server mom_job_sync = True set server keep_completed = 300 set server auto_node_np = True set server next_job_number = 94 set server display_job_server_suffix = False The job should use a single node 'cnode01' , while the allocated node contains another node. see part of 'qstat -f' output: exec_host = snode01/1+snode01/0+cnode01/0 ... Resource_List.neednodes = cnode01 Resource_List.procs = 3 Can anyone give me some suggestion, it'll be greatly appreciated. Xiangqian -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.supercluster.org/pipermail/mauiusers/attachments/20120207/cd4e1a62/attachment.html From jfarran at uci.edu Mon Feb 13 13:19:09 2012 From: jfarran at uci.edu (Joseph Farran) Date: Mon, 13 Feb 2012 12:19:09 -0800 Subject: [Mauiusers] Backfill Jobs prevent PREEMPTOR jobs from running? Message-ID: <4F39703D.50104@uci.edu> An HTML attachment was scrubbed... URL: http://www.supercluster.org/pipermail/mauiusers/attachments/20120213/0f851699/attachment.html From jonathan.michalon at etu.unistra.fr Fri Feb 3 01:58:10 2012 From: jonathan.michalon at etu.unistra.fr (Jonathan Michalon) Date: Fri, 3 Feb 2012 09:58:10 +0100 Subject: [Mauiusers] [Patch] GPUs by the way of GRES Message-ID: <20120203095810.6ba1833b@RunningPenguin.chalmion.homelinux.net> Hi Maui folks, GPUs in Maui are a long standing problem. Last year a patch was sent by Mariusz Mamo?ski [1], which works based on GRES parameters. I've just made GPUs kind of working, by enhancing that patch. Please find attached the resulting patch, which works well for Maui 3.3.1. It defines a special GRES named "gpu" which works as expected on my test cases. Note that GRES behaviour seems quite confused as sometimes they are mentioned as consumable. This patch annihilates this behaviour, for the needs of GPUs. To use the patch: get the sources of maui-3.3.1 and patch them: patch -p1 < ../Patch-for-gpu-GRES.patch then compile as usual. You have to configure the GPUs in maui.cfg: NODECFG[nodename] GRES=gpu:2 Then when queuing jobs you can request GPUs with (Torque syntax): qsub -W x=GRES:gpu at 1 I hope this helps, please test this and enhance to your needs! [1] http://www.supercluster.org/pipermail/mauiusers/2011-April/004622.html Regards, PS. This is the second attempt to send the mail? -- Jonathan Michalon IT student in Strasbourg -------------- next part -------------- A non-text attachment was scrubbed... Name: Patch-for-gpu-GRES.patch Type: text/x-patch Size: 4803 bytes Desc: not available Url : http://www.supercluster.org/pipermail/mauiusers/attachments/20120203/e6847559/attachment.bin From david at unistra.fr Thu Feb 16 00:58:43 2012 From: david at unistra.fr (R. David) Date: Thu, 16 Feb 2012 08:58:43 +0100 Subject: [Mauiusers] Preemption bug in maui ? Message-ID: <2B71C1E0-4120-4816-BEFE-98F5B325A561@unistra.fr> Hello, We think we found a tough bug in maui. We use class-to-node-mapping in our Torque/Maui configuration. It seems that maui does not preempt a job by another if both jobs are on class-to-node-mapped queues, even if the QOS of the preemptor job and preemptee one are correctly set. Did you observe that, too ? Regards, --------------------------------------------------------- R. David - david at unistra.fr Responsable du meso-centre UdS / Direction Informatique Tel. : 03 68 85 45 48 --------------------------------------------------------- From WJEdsall at dow.com Thu Feb 16 07:27:55 2012 From: WJEdsall at dow.com (Edsall, William (WJ)) Date: Thu, 16 Feb 2012 09:27:55 -0500 Subject: [Mauiusers] Backfill Jobs prevent PREEMPTOR jobs from running? In-Reply-To: <4F39703D.50104@uci.edu> References: <4F39703D.50104@uci.edu> Message-ID: <52CD990A674498429E6A7B4FCAE3F7D308332EC3@USMDLMDOWX025.dow.com> Hi, What does diagnose -p say about the priority of the jobs you expect to be preempted? Priority may take precedence over preemptability. From: mauiusers-bounces at supercluster.org [mailto:mauiusers-bounces at supercluster.org] On Behalf Of Joseph Farran Sent: Monday, February 13, 2012 3:19 PM To: mauiusers at supercluster.org Subject: [Mauiusers] Backfill Jobs prevent PREEMPTOR jobs from running? Hi All. We have a test bed setup running Maui 3.3.1 running with Torque 2.5.9. We are testing PREEMPTOR and PREEMTEE setup with Maui and all works well until Maui starts new backfill jobs. The new backfill jobs started by Maui will then not let PREEMTOR jobs run. Is this a bug or are we missing a priority? Our Torque nodes file have: compute-1-1 np=64 chem free compute-1-2 np=64 chem free compute-1-3 np=64 chem free compute-1-4 np=64 chem free compute-1-5 np=64 chem free compute-1-6 np=64 chem free compute-1-7 np=64 tw free compute-1-8 np=64 tw free compute-1-9 np=64 tw free compute-1-10 np=64 tw free Torque "default" queue is "free": set server default_queue = free set server resources_default.nodes = 1 We fill up the queue with 10 63-core jobs and 15 1-core jobs as "PREEMPTEE" jobs "free" queue. Maui starts 10 63-core jobs and 10 1-core jobs (leaving 5 1-core jobs in the queue). As user "tw" I run: qsub -I -q tw -l nodes=4:ppn=64 ( PREEMPTOR job ) it PREEMPTS correctly. Everything works as expected. However, if I run: qsub -I -q tw -l nodes=4:ppn=62 ( PREEMPTOR job ) It PREEMPTS correctly and Maui then starts 4 new backfill 1-core jobs (since there are now 4 new 1-core CPU idle). But here is the problem. If I then re-run the command: qsub -I -q tw -l nodes=4:ppn=64 ( PREEMPTOR job) The job will NOT run. The 4 new 1-core jobs Maui started are not SUSPENDED by Maui. Is this a bug? ------------------- maui.cfg file ------------------------------------ NODEALLOCATIONPOLICY MINRESOURCE PREEMPTPOLICY SUSPEND ENABLEMULTIREQJOBS TRUE # QOS QOSWEIGHT 10000 QOSCFG[high] PRIORITY=1000 QFLAGS=PREEMPTOR QTWEIGHT=100 QOSCFG[low] PRIORITY=0 QFLAGS=PREEMPTEE QTWEIGHT=0 #The default queue is called "free" CLASSCFG[free] QDEF=low CLASSCFG[free] DEFAULT.FEATURES=free # TW Queue SRCFG[tw] PERIOD=INFINITY CLASSCFG[tw] QDEF=high CLASSCFG[tw] DEFAULT.FEATURES=tw -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.supercluster.org/pipermail/mauiusers/attachments/20120216/6fe3fb9a/attachment-0001.html From basv at sara.nl Thu Feb 16 08:52:09 2012 From: basv at sara.nl (Bas van der Vlies) Date: Thu, 16 Feb 2012 16:52:09 +0100 Subject: [Mauiusers] [Patch] GPUs by the way of GRES In-Reply-To: <20120203095810.6ba1833b@RunningPenguin.chalmion.homelinux.net> References: <20120203095810.6ba1833b@RunningPenguin.chalmion.homelinux.net> Message-ID: <4F3D2629.7050506@sara.nl> Jonathan, Thanks for the patch. I will test it and i have also write access to the subversion tree. So if more sites have tested and it works than we can commit it to maui trunk source. regards On 02/03/2012 09:58 AM, Jonathan Michalon wrote: > Hi Maui folks, > > GPUs in Maui are a long standing problem. Last year a patch was sent by Mariusz > Mamo?ski [1], which works based on GRES parameters. > I've just made GPUs kind of working, by enhancing that patch. Please find > attached the resulting patch, which works well for Maui 3.3.1. > It defines a special GRES named "gpu" which works as expected on my test cases. > > Note that GRES behaviour seems quite confused as sometimes they are mentioned > as consumable. This patch annihilates this behaviour, for the needs of GPUs. > > To use the patch: > get the sources of maui-3.3.1 and patch them: > patch -p1< ../Patch-for-gpu-GRES.patch > then compile as usual. > > You have to configure the GPUs in maui.cfg: > NODECFG[nodename] GRES=gpu:2 > > Then when queuing jobs you can request GPUs with (Torque syntax): > qsub -W x=GRES:gpu at 1 > > I hope this helps, please test this and enhance to your needs! > > [1] > http://www.supercluster.org/pipermail/mauiusers/2011-April/004622.html > > Regards, > > PS. This is the second attempt to send the mail? > > -- > Jonathan Michalon > IT student in Strasbourg -- ******************************************************************** * Bas van der Vlies e-mail: basv at sara.nl * * SARA - Academic Computing Services Amsterdam, The Netherlands * ******************************************************************** -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 4553 bytes Desc: S/MIME Cryptographic Signature Url : http://www.supercluster.org/pipermail/mauiusers/attachments/20120216/ebbc179e/attachment.bin From jfarran at uci.edu Thu Feb 16 09:48:00 2012 From: jfarran at uci.edu (Joseph Farran) Date: Thu, 16 Feb 2012 08:48:00 -0800 Subject: [Mauiusers] Backfill Jobs prevent PREEMPTOR jobs from running? In-Reply-To: <52CD990A674498429E6A7B4FCAE3F7D308332EC3@USMDLMDOWX025.dow.com> References: <4F39703D.50104@uci.edu> <52CD990A674498429E6A7B4FCAE3F7D308332EC3@USMDLMDOWX025.dow.com> Message-ID: <4F3D3340.90805@uci.edu> An HTML attachment was scrubbed... URL: http://www.supercluster.org/pipermail/mauiusers/attachments/20120216/718a52b0/attachment-0001.html From WJEdsall at dow.com Thu Feb 16 10:19:49 2012 From: WJEdsall at dow.com (Edsall, William (WJ)) Date: Thu, 16 Feb 2012 12:19:49 -0500 Subject: [Mauiusers] Backfill Jobs prevent PREEMPTOR jobs from running? In-Reply-To: <4F3D3340.90805@uci.edu> References: <4F39703D.50104@uci.edu> <52CD990A674498429E6A7B4FCAE3F7D308332EC3@USMDLMDOWX025.dow.com> <4F3D3340.90805@uci.edu> Message-ID: <52CD990A674498429E6A7B4FCAE3F7D308333062@USMDLMDOWX025.dow.com> Hello, Diagnose -p was truncated. I was hoping to see that 33-35 (Queued) did not have a large QTime which may be increasing their priority higher than your job 38. That could cause them to make job 38 wait even though they are not running. Sounds doubtful in your scenario but I've seen it cause issues before. If you delete the Q state jobs 33-35, does your 38 start? We use the same preemption concept you're trying to achieve but I'm having a hard time narrowing down the cause for your error. A few small differences with our configuration is the backfill policy and reservation policy. You might try these settings and then restart maui: BACKFILLPOLICY BESTFIT RESERVATIONPOLICY CURRENTHIGHEST From: Joseph Farran [mailto:jfarran at uci.edu] Sent: Thursday, February 16, 2012 11:48 AM To: Edsall, William (WJ); mauiusers at supercluster.org Subject: Re: [Mauiusers] Backfill Jobs prevent PREEMPTOR jobs from running? Hi Edsall. Thank you for responding. I have a few more nodes now, but the same configuration. I am including the diagnose -p with other details: We have 13 64-core nodes. All nodes have the 'free' feature and a queue named 'free' as PREEMPTEE so that we can harvest idle cycles when the nodes are not in use by their owners. As user "juser", I load up the 'free' queue (PREEMTEE) as follows: 1.hpc.cluster. juser free test 24904 1 63 -- 72:00 R 00:01 2.hpc.cluster. juser free test 29346 1 63 -- 72:00 R 00:01 3.hpc.cluster. juser free test 42900 1 63 -- 72:00 R 00:01 4.hpc.cluster. juser free test 30291 1 63 -- 72:00 R 00:01 5.hpc.cluster. juser free test 26417 1 63 -- 72:00 R 00:01 6.hpc.cluster. juser free test 40206 1 63 -- 72:00 R 00:01 7.hpc.cluster. juser free test 1786 1 63 -- 72:00 R 00:01 8.hpc.cluster. juser free test 62436 1 63 -- 72:00 R 00:01 9.hpc.cluster. juser free test 49087 1 63 -- 72:00 R 00:01 10.hpc.cluster juser free test 45691 1 63 -- 72:00 R 00:01 11.hpc.cluster juser free test 41386 1 63 -- 72:00 R 00:01 12.hpc.cluster juser free test 35204 1 63 -- 72:00 R 00:01 13.hpc.cluster juser free test 51043 1 63 -- 72:00 R 00:01 14.hpc.cluster juser free test 24948 1 1 -- 72:00 R 00:01 15.hpc.cluster juser free test 29390 1 1 -- 72:00 R 00:01 16.hpc.cluster juser free test 42944 1 1 -- 72:00 R 00:01 17.hpc.cluster juser free test 30335 1 1 -- 72:00 R 00:01 18.hpc.cluster juser free test 26461 1 1 -- 72:00 R 00:01 19.hpc.cluster juser free test 40250 1 1 -- 72:00 R 00:01 20.hpc.cluster juser free test 1830 1 1 -- 72:00 R 00:01 21.hpc.cluster juser free test 62480 1 1 -- 72:00 R 00:01 22.hpc.cluster juser free test 49131 1 1 -- 72:00 R 00:01 23.hpc.cluster juser free test 45735 1 1 -- 72:00 R 00:01 24.hpc.cluster juser free test 41430 1 1 -- 72:00 R 00:01 25.hpc.cluster juser free test 35248 1 1 -- 72:00 R 00:01 26.hpc.cluster juser free test 51087 1 1 -- 72:00 R 00:01 27.hpc.cluster juser free test -- 1 1 -- 72:00 Q -- 28.hpc.cluster juser free test -- 1 1 -- 72:00 Q -- 29.hpc.cluster juser free test -- 1 1 -- 72:00 Q -- 30.hpc.cluster juser free test -- 1 1 -- 72:00 Q -- 31.hpc.cluster juser free test -- 1 1 -- 72:00 Q -- 32.hpc.cluster juser free test -- 1 1 -- 72:00 Q -- 33.hpc.cluster juser free test -- 1 1 -- 72:00 Q -- 34.hpc.cluster juser free test -- 1 1 -- 72:00 Q -- 35.hpc.cluster juser free test -- 1 1 -- 72:00 Q -- As user "tw" which owes the 'tw' nodes, I run: qsub -I -q tw -l nodes=6:ppn=64 And preeption works as expected: 1.hpc.cluster. juser free test 24904 1 63 -- 72:00 R 00:02 2.hpc.cluster. juser free test 29346 1 63 -- 72:00 S 00:01 3.hpc.cluster. juser free test 42900 1 63 -- 72:00 S 00:01 4.hpc.cluster. juser free test 30291 1 63 -- 72:00 S 00:01 5.hpc.cluster. juser free test 26417 1 63 -- 72:00 S 00:01 6.hpc.cluster. juser free test 40206 1 63 -- 72:00 S 00:01 7.hpc.cluster. juser free test 1786 1 63 -- 72:00 S 00:01 8.hpc.cluster. juser free test 62436 1 63 -- 72:00 R 00:01 9.hpc.cluster. juser free test 49087 1 63 -- 72:00 R 00:02 10.hpc.cluster juser free test 45691 1 63 -- 72:00 R 00:02 11.hpc.cluster juser free test 41386 1 63 -- 72:00 R 00:02 12.hpc.cluster juser free test 35204 1 63 -- 72:00 R 00:02 13.hpc.cluster juser free test 51043 1 63 -- 72:00 R 00:02 14.hpc.cluster juser free test 24948 1 1 -- 72:00 R 00:02 15.hpc.cluster juser free test 29390 1 1 -- 72:00 S 00:02 16.hpc.cluster juser free test 42944 1 1 -- 72:00 S 00:01 17.hpc.cluster juser free test 30335 1 1 -- 72:00 S 00:01 18.hpc.cluster juser free test 26461 1 1 -- 72:00 S 00:01 19.hpc.cluster juser free test 40250 1 1 -- 72:00 S 00:01 20.hpc.cluster juser free test 1830 1 1 -- 72:00 S 00:01 21.hpc.cluster juser free test 62480 1 1 -- 72:00 R 00:02 22.hpc.cluster juser free test 49131 1 1 -- 72:00 R 00:02 23.hpc.cluster juser free test 45735 1 1 -- 72:00 R 00:02 24.hpc.cluster juser free test 41430 1 1 -- 72:00 R 00:02 25.hpc.cluster juser free test 35248 1 1 -- 72:00 R 00:02 26.hpc.cluster juser free test 51087 1 1 -- 72:00 R 00:02 27.hpc.cluster juser free test -- 1 1 -- 72:00 Q -- 28.hpc.cluster juser free test -- 1 1 -- 72:00 Q -- 29.hpc.cluster juser free test -- 1 1 -- 72:00 Q -- 30.hpc.cluster juser free test -- 1 1 -- 72:00 Q -- 31.hpc.cluster juser free test -- 1 1 -- 72:00 Q -- 32.hpc.cluster juser free test -- 1 1 -- 72:00 Q -- 33.hpc.cluster juser free test -- 1 1 -- 72:00 Q -- 34.hpc.cluster juser free test -- 1 1 -- 72:00 Q -- 35.hpc.cluster juser free test -- 1 1 -- 72:00 Q -- 36.hpc.cluster tw tw STDIN 30505 6 384 -- 99:00 R -- As user 'tw', I exit and run the command: qsub -I -q tw -l nodes=6:ppn=62 Everything works again as expected and Maui also starts 6 new 1-core jobs ( jobs 21 through 26 ): 1.hpc.cluster. juser free test 24904 1 63 -- 72:00 R 00:03 2.hpc.cluster. juser free test 29346 1 63 -- 72:00 S 00:01 3.hpc.cluster. juser free test 42900 1 63 -- 72:00 S 00:01 4.hpc.cluster. juser free test 30291 1 63 -- 72:00 S 00:01 5.hpc.cluster. juser free test 26417 1 63 -- 72:00 S 00:01 6.hpc.cluster. juser free test 40206 1 63 -- 72:00 S 00:02 7.hpc.cluster. juser free test 1786 1 63 -- 72:00 S 00:02 8.hpc.cluster. juser free test 62436 1 63 -- 72:00 R 00:03 9.hpc.cluster. juser free test 49087 1 63 -- 72:00 R 00:03 10.hpc.cluster juser free test 45691 1 63 -- 72:00 R 00:03 11.hpc.cluster juser free test 41386 1 63 -- 72:00 R 00:03 12.hpc.cluster juser free test 35204 1 63 -- 72:00 R 00:03 13.hpc.cluster juser free test 51043 1 63 -- 72:00 R 00:03 14.hpc.cluster juser free test 24948 1 1 -- 72:00 R 00:03 15.hpc.cluster juser free test 29390 1 1 -- 72:00 R 00:02 16.hpc.cluster juser free test 42944 1 1 -- 72:00 R 00:02 17.hpc.cluster juser free test 30335 1 1 -- 72:00 R 00:02 18.hpc.cluster juser free test 26461 1 1 -- 72:00 R 00:02 19.hpc.cluster juser free test 40250 1 1 -- 72:00 R 00:02 20.hpc.cluster juser free test 1830 1 1 -- 72:00 R 00:02 21.hpc.cluster juser free test 62480 1 1 -- 72:00 R 00:03 22.hpc.cluster juser free test 49131 1 1 -- 72:00 R 00:03 23.hpc.cluster juser free test 45735 1 1 -- 72:00 R 00:03 24.hpc.cluster juser free test 41430 1 1 -- 72:00 R 00:03 25.hpc.cluster juser free test 35248 1 1 -- 72:00 R 00:03 26.hpc.cluster juser free test 51087 1 1 -- 72:00 R 00:03 27.hpc.cluster juser free test 30749 1 1 -- 72:00 R -- 28.hpc.cluster juser free test 44220 1 1 -- 72:00 R -- 29.hpc.cluster juser free test 31513 1 1 -- 72:00 R -- 30.hpc.cluster juser free test 27736 1 1 -- 72:00 R -- 31.hpc.cluster juser free test 41429 1 1 -- 72:00 R -- 32.hpc.cluster juser free test 3130 1 1 -- 72:00 R -- 33.hpc.cluster juser free test -- 1 1 -- 72:00 Q -- 34.hpc.cluster juser free test -- 1 1 -- 72:00 Q -- 35.hpc.cluster juser free test -- 1 1 -- 72:00 Q -- 37.hpc.cluster tw tw STDIN 30708 6 372 -- 99:00 R -- However, if I now exit and go back and try to get 6 of the 64-core nodes (which worked before) I cannot. Maui will not preempt the new jobs it started. My new job 38 below just sits in the queue: $ qsub -I -q tw -l nodes=6:ppn=64 qsub: waiting for job 38.hpc.cluster to start # diagnose -p diagnosing job priority information (partition: ALL) Job PRIORITY* Cred( QOS) Serv(QTime) Weights -------- 100( 1000) 1( 1) 38 100000109 100.0(1000.) 0.0(109.4) 2 5 0.0( 0.0) 100.0( 5.3) 3 5 0.0( 0.0) 100.0( 5.3) 4 5 0.0( 0.0) 100.0( 5.3) 5 5 0.0( 0.0) 100.0( 5.3) 6 5 0.0( 0.0) 100.0( 5.3) 7 5 0.0( 0.0) 100.0( 5.3) Percent Contribution -------- 100.0(100.0) 0.0( 0.0) [root at mpc-x maui]# checkjob -v 38 checking job 38 (RM job '38.hpc.cluster') State: Idle Creds: user:tw group:tw class:tw qos:high WallTime: 00:00:00 of 4:03:00:00 SubmitTime: Thu Feb 16 08:26:31 (Time Queued Total: 00:01:37 Eligible: 00:01:37) Total Tasks: 384 Req[0] TaskCount: 384 Partition: ALL Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0 Opsys: [NONE] Arch: [NONE] Features: [tw] Exec: '' ExecSize: 0 ImageSize: 0 Dedicated Resources Per Task: PROCS: 1 NodeAccess: SHARED TasksPerNode: 64 NodeCount: 6 IWD: [NONE] Executable: [NONE] Bypass: 0 StartCount: 0 PartitionMask: [ALL] Flags: PREEMPTOR Reservation '38' (2:23:58:22 -> 7:02:58:22 Duration: 4:03:00:00) PE: 384.00 StartPriority: 100000163 job cannot run in partition DEFAULT (idle procs do not meet requirements : 0 of 384 procs found) idle procs: 384 feasible procs: 0 Rejection Reasons: [Features : 7][CPU : 6] Detailed Node Availability Information: compute-1-1 rejected : Features compute-1-2 rejected : Features compute-1-3 rejected : Features compute-1-4 rejected : Features compute-1-5 rejected : Features compute-1-6 rejected : Features compute-1-7 rejected : CPU compute-1-8 rejected : CPU compute-1-9 rejected : CPU compute-1-10 rejected : CPU compute-1-11 rejected : CPU compute-1-12 rejected : CPU compute-1-13 rejected : Features ------------------------------------------------------- Here is my PBS nodes file: # cat /opt/torque/server_priv/nodes compute-1-1 np=64 sf free compute-1-2 np=64 sf free compute-1-3 np=64 sf free compute-1-4 np=64 chem free compute-1-5 np=64 chem free compute-1-6 np=64 chem free compute-1-7 np=64 tw free compute-1-8 np=64 tw free compute-1-9 np=64 tw free compute-1-10 np=64 tw free compute-1-11 np=64 tw free compute-1-12 np=64 tw free compute-1-13 np=64 bio free ------------------------------------ Edsall, William (WJ) wrote: Hi, What does diagnose -p say about the priority of the jobs you expect to be preempted? Priority may take precedence over preemptability. From: mauiusers-bounces at supercluster.org [mailto:mauiusers-bounces at supercluster.org] On Behalf Of Joseph Farran Sent: Monday, February 13, 2012 3:19 PM To: mauiusers at supercluster.org Subject: [Mauiusers] Backfill Jobs prevent PREEMPTOR jobs from running? -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.supercluster.org/pipermail/mauiusers/attachments/20120216/334e16c9/attachment-0001.html From jfarran at uci.edu Thu Feb 16 11:07:13 2012 From: jfarran at uci.edu (Joseph Farran) Date: Thu, 16 Feb 2012 10:07:13 -0800 Subject: [Mauiusers] Backfill Jobs prevent PREEMPTOR jobs from running? In-Reply-To: <52CD990A674498429E6A7B4FCAE3F7D308333062@USMDLMDOWX025.dow.com> References: <4F39703D.50104@uci.edu> <52CD990A674498429E6A7B4FCAE3F7D308332EC3@USMDLMDOWX025.dow.com> <4F3D3340.90805@uci.edu> <52CD990A674498429E6A7B4FCAE3F7D308333062@USMDLMDOWX025.dow.com> Message-ID: <4F3D45D1.5080808@uci.edu> An HTML attachment was scrubbed... URL: http://www.supercluster.org/pipermail/mauiusers/attachments/20120216/74f21f09/attachment.html From pankaj.dorlikar at gmail.com Sat Feb 25 04:12:05 2012 From: pankaj.dorlikar at gmail.com (pankaj dorlikar) Date: Sat, 25 Feb 2012 16:42:05 +0530 Subject: [Mauiusers] job does not start Message-ID: Hi, We have Torque Server Version 2.5.8 and maui version 3.2.6p1 installed on rhel 5.2 server. "showstart" for one of the jobs says that job should start now i.e. Earliest start in 00:00:00 on current time. ######################## checkjob -vv says that checkjob -vv 62235 checking job 62235 (RM job '62235.yc9.cn.yuva.param') State: Idle Creds: user:abcd group:pqr account:PQR-PR class:q1 qos:q1-qos WallTime: 00:00:00 of 2:05:00:00 SubmitTime: Thu Feb 23 18:56:26 (Time Queued Total: 1:21:27:05 Eligible: 1:21:27:05) Total Tasks: 2 Req[0] TaskCount: 2 Partition: ALL Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0 Opsys: [NONE] Arch: [NONE] Features: [NONE] Exec: '' ExecSize: 0 ImageSize: 0 Dedicated Resources Per Task: PROCS: 1 NodeAccess: SHARED NodeCount: 0 IWD: [NONE] Executable: [NONE] Bypass: 51 StartCount: 0 PartitionMask: [ALL] Reservation '62235' (00:00:00 -> 2:05:00:00 Duration: 2:05:00:00) PE: 2.00 StartPriority: 2727 job cannot run in partition DEFAULT (insufficient idle procs available: 0 < 2) job can run in partition P1 (32 procs available. 2 procs required) job can run in partition P2 (48 procs available. 2 procs required) ######################## showres -n 62235 says that reservations on Sat Feb 25 16:28:10 NodeName Type ReservationID JobState Task Start Duration StartTime node16.clusternode Job 62235 Idle 2 00:00:00 2:05:00:00 Sat Feb 25 16:28:10 1 nodes reserved ############################ checknode node16.clusternode says that node is available for job run. but somehow job is not going and is not giving any error in maui, pbs_server,pbs_mom logs also. What can be the issue? What can be done to make job run and avoid the same in future? thank you -pankakjd -- Pankaj V. Dorlikar -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.supercluster.org/pipermail/mauiusers/attachments/20120225/d3346fa1/attachment-0001.html From pankaj.dorlikar at gmail.com Mon Feb 27 11:05:57 2012 From: pankaj.dorlikar at gmail.com (pankaj dorlikar) Date: Mon, 27 Feb 2012 23:35:57 +0530 Subject: [Mauiusers] job in queue and does not run Message-ID: ---------- Forwarded message ---------- From: pankaj dorlikar Date: Sat, Feb 25, 2012 at 4:42 PM Subject: job does not start To: mauiusers at supercluster.org, torqueusers at supercluster.org Hi, We have Torque Server Version 2.5.8 and maui version 3.2.6p1 installed on rhel 5.2 server. "showstart" for one of the jobs says that job should start now i.e. Earliest start in 00:00:00 on current time. ######################## checkjob -vv says that checkjob -vv 62235 checking job 62235 (RM job '62235.server_pbs.clusternode') State: Idle Creds: user:abcd group:pqr account:PQR-PR class:q1 qos:q1-qos WallTime: 00:00:00 of 2:05:00:00 SubmitTime: Thu Feb 23 18:56:26 (Time Queued Total: 1:21:27:05 Eligible: 1:21:27:05) Total Tasks: 2 Req[0] TaskCount: 2 Partition: ALL Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0 Opsys: [NONE] Arch: [NONE] Features: [NONE] Exec: '' ExecSize: 0 ImageSize: 0 Dedicated Resources Per Task: PROCS: 1 NodeAccess: SHARED NodeCount: 0 IWD: [NONE] Executable: [NONE] Bypass: 51 StartCount: 0 PartitionMask: [ALL] Reservation '62235' (00:00:00 -> 2:05:00:00 Duration: 2:05:00:00) PE: 2.00 StartPriority: 2727 job cannot run in partition DEFAULT (insufficient idle procs available: 0 < 2) job can run in partition P1 (32 procs available. 2 procs required) job can run in partition P2 (48 procs available. 2 procs required) ######################## showres -n 62235 says that reservations on Sat Feb 25 16:28:10 NodeName Type ReservationID JobState Task Start Duration StartTime node16.clusternode Job 62235 Idle 2 00:00:00 2:05:00:00 Sat Feb 25 16:28:10 1 nodes reserved ############################ checknode node16.clusternode says that node is available for job run. but somehow job is not going and is not giving any error in maui, pbs_server,pbs_mom logs also. What can be the issue? What can be done to make job run and avoid the same in future? thank you -pankakjd -- Pankaj V. Dorlikar -- Pankaj V. Dorlikar -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.supercluster.org/pipermail/mauiusers/attachments/20120227/7b5cd872/attachment.html