From jkusznir at gmail.com Wed Sep 7 18:32:13 2011 From: jkusznir at gmail.com (Jim Kusznir) Date: Wed, 7 Sep 2011 17:32:13 -0700 Subject: [Mauiusers] Maui assigns too many resources Message-ID: Hi all: I've got a user who's creating a bunch of single-threaded jobs via script (about 250 at a shot). All are specified (in torque) as -l nodes=1:ppn=1. However, half of his jobs end up queued rather than running (he sizes his job to take the entire cluster). When I look into why, checkjob shows that the resources allocated (2) exceeds requested (1), and showq shows that it assigned 2 cores per job, yet torque can't show that anywhere. To fix, I restart maui, and it correctly sees that each job should only be 1 core and starts the rest of the jobs that were queued. When jobs are in queue, showq shows them as requiring only one processor. How can I fix this permanently? maui 3.2.6p19 (as installed on a rocks cluster from the torque+maui roll, rocks 5.1) torque-2.3.0 Thanks! --Jim From gus at ldeo.columbia.edu Thu Sep 8 08:32:56 2011 From: gus at ldeo.columbia.edu (Gus Correa) Date: Thu, 08 Sep 2011 10:32:56 -0400 Subject: [Mauiusers] Maui assigns too many resources In-Reply-To: References: Message-ID: <4E68D218.2070709@ldeo.columbia.edu> Jim Kusznir wrote: > Hi all: > > I've got a user who's creating a bunch of single-threaded jobs via > script (about 250 at a shot). All are specified (in torque) as -l > nodes=1:ppn=1. However, half of his jobs end up queued rather than > running (he sizes his job to take the entire cluster). When I look > into why, checkjob shows that the resources allocated (2) exceeds > requested (1), and showq shows that it assigned 2 cores per job, yet > torque can't show that anywhere. To fix, I restart maui, and it > correctly sees that each job should only be 1 core and starts the rest > of the jobs that were queued. When jobs are in queue, showq shows > them as requiring only one processor. > > How can I fix this permanently? > > maui 3.2.6p19 (as installed on a rocks cluster from the torque+maui > roll, rocks 5.1) > torque-2.3.0 > > Thanks! > --Jim > _______________________________________________ > mauiusers mailing list > mauiusers at supercluster.org > http://www.supercluster.org/mailman/listinfo/mauiusers Hi Jim Some guesses: Look at your JOBNODEMATCHPOLICY in ${MAUI}/maui.cfg. To pack multiple jobs on a node you could choose it to be EXACTPROC. http://www.adaptivecomputing.com/resources/docs/maui/a.fparameters.php Another thing to look at, is DEFERTIME. The default is 1 hour. You could set it to less. For instance, if you want it to be one minute, add this line: DEFERTIME 00:01:00 to your ${MAUI}/maui.cfg file and restart maui. http://www.adaptivecomputing.com/resources/docs/maui/a.fparameters.php I hope this helps, Gus Correa From jkusznir at gmail.com Thu Sep 8 09:59:52 2011 From: jkusznir at gmail.com (Jim Kusznir) Date: Thu, 8 Sep 2011 08:59:52 -0700 Subject: [Mauiusers] Maui assigns too many resources In-Reply-To: <4E68D218.2070709@ldeo.columbia.edu> References: <4E68D218.2070709@ldeo.columbia.edu> Message-ID: This isn't quite the problem. The problem is that even though a user requests 1 node, 1 PPN, and torque shows it as such, maui (through showq) shows this as needing 2 processors per node, and thereby has allocated 100% of the cluster's resources. Even torque output shows that more resources have been assigned than the job requested (eg, "the scheduler messed up"). This only happens on this one users' jobs. Restarting maui causes it to realize these jobs only needed one processor, and appropriately schedules the remaining jobs. --Jim On Thu, Sep 8, 2011 at 7:32 AM, Gus Correa wrote: > Jim Kusznir wrote: >> >> Hi all: >> >> I've got a user who's creating a bunch of single-threaded jobs via >> script (about 250 at a shot). ?All are specified (in torque) as -l >> nodes=1:ppn=1. ?However, half of his jobs end up queued rather than >> running (he sizes his job to take the entire cluster). ?When I look >> into why, checkjob shows that the resources allocated (2) exceeds >> requested (1), and showq shows that it assigned 2 cores per job, yet >> torque can't show that anywhere. ?To fix, I restart maui, and it >> correctly sees that each job should only be 1 core and starts the rest >> of the jobs that were queued. ?When jobs are in queue, showq shows >> them as requiring only one processor. >> >> How can I fix this permanently? >> >> maui 3.2.6p19 (as installed on a rocks cluster from the torque+maui >> roll, rocks 5.1) >> torque-2.3.0 >> >> Thanks! >> --Jim >> _______________________________________________ >> mauiusers mailing list >> mauiusers at supercluster.org >> http://www.supercluster.org/mailman/listinfo/mauiusers > > Hi Jim > > Some guesses: > > Look at your JOBNODEMATCHPOLICY in ${MAUI}/maui.cfg. > To pack multiple jobs on a node you could choose it to be EXACTPROC. > http://www.adaptivecomputing.com/resources/docs/maui/a.fparameters.php > > Another thing to look at, is DEFERTIME. > The default is 1 hour. > You could set it to less. > For instance, if you want it to be one minute, add this line: > DEFERTIME 00:01:00 > to your ${MAUI}/maui.cfg file and restart maui. > http://www.adaptivecomputing.com/resources/docs/maui/a.fparameters.php > > I hope this helps, > Gus Correa > From roy.dragseth at cc.uit.no Thu Sep 8 10:06:11 2011 From: roy.dragseth at cc.uit.no (Roy Dragseth) Date: Thu, 8 Sep 2011 18:06:11 +0200 Subject: [Mauiusers] Maui assigns too many resources Message-ID: <201109081806.11380.roy.dragseth@cc.uit.no> On Thursday 8. September 2011 17.59.52 Jim Kusznir wrote: > This isn't quite the problem. The problem is that even though a user > requests 1 node, 1 PPN, and torque shows it as such, maui (through > showq) shows this as needing 2 processors per node, and thereby has > allocated 100% of the cluster's resources. Even torque output shows > that more resources have been assigned than the job requested (eg, > "the scheduler messed up"). > > This only happens on this one users' jobs. Restarting maui causes it > to realize these jobs only needed one processor, and appropriately > schedules the remaining jobs. > > --Jim > > On Thu, Sep 8, 2011 at 7:32 AM, Gus Correa wrote: > > Jim Kusznir wrote: > >> Hi all: > >> > >> I've got a user who's creating a bunch of single-threaded jobs via > >> script (about 250 at a shot). All are specified (in torque) as -l > >> nodes=1:ppn=1. However, half of his jobs end up queued rather than > >> running (he sizes his job to take the entire cluster). When I look > >> into why, checkjob shows that the resources allocated (2) exceeds > >> requested (1), and showq shows that it assigned 2 cores per job, yet > >> torque can't show that anywhere. To fix, I restart maui, and it > >> correctly sees that each job should only be 1 core and starts the rest > >> of the jobs that were queued. When jobs are in queue, showq shows > >> them as requiring only one processor. > >> > >> How can I fix this permanently? > >> > >> maui 3.2.6p19 (as installed on a rocks cluster from the torque+maui > >> roll, rocks 5.1) > >> torque-2.3.0 > >> > >> Thanks! > >> --Jim > >> _______________________________________________ > >> mauiusers mailing list > >> mauiusers at supercluster.org > >> http://www.supercluster.org/mailman/listinfo/mauiusers > > > > Hi Jim > > > > Some guesses: > > > > Look at your JOBNODEMATCHPOLICY in ${MAUI}/maui.cfg. > > To pack multiple jobs on a node you could choose it to be EXACTPROC. > > http://www.adaptivecomputing.com/resources/docs/maui/a.fparameters.php > > > > Another thing to look at, is DEFERTIME. > > The default is 1 hour. > > You could set it to less. > > For instance, if you want it to be one minute, add this line: > > DEFERTIME 00:01:00 > > to your ${MAUI}/maui.cfg file and restart maui. > > http://www.adaptivecomputing.com/resources/docs/maui/a.fparameters.php > > > > I hope this helps, > > Gus Correa > Strange, I haven't seen this before even an old release as Rocks 5.1. Could you post the output of qstat -f JOBID and checkjob JOBID? r. -- The Computer Center, University of Troms?, N-9037 TROMS? Norway. phone:+47 77 64 41 07, fax:+47 77 64 41 00 Roy Dragseth, Team Leader, High Performance Computing Direct call: +47 77 64 62 56. email: roy.dragseth at uit.no From nt_mahmood at yahoo.com Mon Sep 12 03:01:30 2011 From: nt_mahmood at yahoo.com (Mahmood Naderan) Date: Mon, 12 Sep 2011 02:01:30 -0700 (PDT) Subject: [Mauiusers] Job is in 'Q' but checkjob shows it is running (!) In-Reply-To: <1315719989.32315.YahooMailNeo@web111717.mail.gq1.yahoo.com> References: <1315719989.32315.YahooMailNeo@web111717.mail.gq1.yahoo.com> Message-ID: <1315818090.92454.YahooMailNeo@web111721.mail.gq1.yahoo.com> Hi, I sent this email to torque mailing list but seems that it is related to maui. So I restate the problem here. Can someone explain why the qstat shows a job in "Q" but checkjob says everything is normal? mahmood at srv1:416.gamess$ qstat 49003 Job id??????????????????? Name???????????? User??????????? Time Use S Queue ------------------------- ---------------- --------------- -------- - ----- 49003.srv1???????????????? gamess?????????? mahmood??????????????? 0 Q Long mahmood at srv1:416.gamess$ checkjob 49003 checking job 49003 State: Idle Creds:? user:mahmood? group:mahmood? class:Long??? qos:DEFAULT WallTime: 00:00:00 of 40:00:00:00 SubmitTime: Sun Sep 11 09:51:26 ? (Time Queued? Total: 00:02:36? Eligible: 00:02:36) Total Tasks: 1 Req[0]? TaskCount: 1? Partition: ALL Network: [NONE]? Memory >= 0? Disk >= 0? Swap >= 0 Opsys: [NONE]? Arch: [NONE]? Features: [NONE] IWD: [NONE]? Executable:? [NONE] Bypass: 0? StartCount: 0 PartitionMask: [ALL] Flags:?????? HOSTLIST RESTARTABLE HostList: ? [hawk:1] PE:? 1.00? StartPriority:? 129 job can run in partition DEFAULT (3 procs available.? 1 procs required) Thanks // Naderan *Mahmood; From scrusan at ur.rochester.edu Mon Sep 12 07:47:45 2011 From: scrusan at ur.rochester.edu (Steve Crusan) Date: Mon, 12 Sep 2011 09:47:45 -0400 Subject: [Mauiusers] Job is in 'Q' but checkjob shows it is running (!) In-Reply-To: <1315818090.92454.YahooMailNeo@web111721.mail.gq1.yahoo.com> References: <1315719989.32315.YahooMailNeo@web111717.mail.gq1.yahoo.com> <1315818090.92454.YahooMailNeo@web111721.mail.gq1.yahoo.com> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sep 12, 2011, at 5:01 AM, Mahmood Naderan wrote: > > > Hi, > I sent this email to torque mailing list but seems that it is related to maui. So I restate the problem here. > > Can someone explain why the qstat shows a job in "Q" but checkjob says everything is normal? Looking below, the job is queued in TORQUE, and idle in Maui (not running), so everything is normal. Do you mean why isn't the job running, even though it seems that it *should* be running? If so, I would say post the output of qstat -f for the job, and checkjob -v. This seems to be more or less a scheduler configuration, or possibly an issue with the node (which you seem to have manually selected in your qsub statement). > > mahmood at srv1:416.gamess$ qstat 49003 > Job id Name User Time Use S Queue > ------------------------- ---------------- --------------- -------- - ----- > 49003.srv1 gamess mahmood 0 Q Long > > > mahmood at srv1:416.gamess$ checkjob 49003 > checking job 49003 > > State: Idle > Creds: user:mahmood group:mahmood class:Long qos:DEFAULT > WallTime: 00:00:00 of 40:00:00:00 > SubmitTime: Sun Sep 11 09:51:26 > (Time Queued Total: 00:02:36 Eligible: 00:02:36) > > Total Tasks: 1 > > Req[0] TaskCount: 1 Partition: ALL > Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0 > Opsys: [NONE] Arch: [NONE] Features: [NONE] > > > IWD: [NONE] Executable: [NONE] > Bypass: 0 StartCount: 0 > PartitionMask: [ALL] > Flags: HOSTLIST RESTARTABLE > HostList: > [hawk:1] > PE: 1.00 StartPriority: 129 > job can run in partition DEFAULT (3 procs available. 1 procs required) > > Thanks > // Naderan *Mahmood; > > _______________________________________________ > mauiusers mailing list > mauiusers at supercluster.org > http://www.supercluster.org/mailman/listinfo/mauiusers ---------------------- Steve Crusan System Administrator Center for Research Computing University of Rochester https://www.crc.rochester.edu/ -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.17 (Darwin) Comment: GPGTools - http://gpgtools.org iQEcBAEBAgAGBQJObg2IAAoJENS19LGOpgqKAnIIAKHvbLmV9Hs31IZ4AGHIOFG9 Wxp+qiXOnIMoKQQjhkkou1zVC4OKHnymcE/LxtiQcAuX+Lu8gd/GAR1tF5FeCF4g m7go12yb5Dx97sHgl2SjmRY3duDkx6YMfOGgxCuiN+O5SdkUazuW8GPkW+HPPS7/ T3gDbG0jizZ6A5LzhJqgPyVC4LKkwYt5v9NQBs/f82ZOGqPusEWdJ4N5oaUYhyG/ OXSj/xmzMTCYCqfdOUZynq4ACQotRbNmY7wrV+Uc0qWUFtZv/RIwQ/O4P261E/1/ dfrVX3OEdz9FBy4uoNrgMyNxL2eOanNiKSlhHJnoM04zx0SkAYGDOeGPqYv/vi0= =QcC7 -----END PGP SIGNATURE----- From nt_mahmood at yahoo.com Mon Sep 12 10:27:58 2011 From: nt_mahmood at yahoo.com (Mahmood Naderan) Date: Mon, 12 Sep 2011 09:27:58 -0700 (PDT) Subject: [Mauiusers] Job is in 'Q' but checkjob shows it is running (!) In-Reply-To: References: <1315719989.32315.YahooMailNeo@web111717.mail.gq1.yahoo.com> <1315818090.92454.YahooMailNeo@web111721.mail.gq1.yahoo.com> Message-ID: <1315844878.5007.YahooMailNeo@web111725.mail.gq1.yahoo.com> >Do you mean why isn't the job running, even though it seems that it *should* be running? Exactly... >If so, I would say post the output of qstat -f for the job, and checkjob -v mahmood at srv1:~$ qstat -f 49153 Job Id: 49153.srv1 ??? Job_Name = bwaves ??? Job_Owner = mahmood at srv1 ??? job_state = Q ??? queue = Long ??? server = srv1 ??? Checkpoint = u ??? ctime = Mon Sep 12 19:55:29 2011 ??? Error_Path = srv1:/home/mahmood/multi2sim-3.0.3/410.bwave ??????? s/bwaves.e49153 ??? Hold_Types = n ??? Join_Path = oe ??? Keep_Files = n ??? Mail_Points = a ??? mtime = Mon Sep 12 19:55:29 2011 ??? Output_Path = srv1:/home/mahmood/multi2sim-3.0.3/410.bwav ??????? es/bwaves_128.out ??? Priority = 0 ??? qtime = Mon Sep 12 19:55:29 2011 ??? Rerunable = True ??? Resource_List.nodect = 1 ??? Resource_List.nodes = node2 ??? Resource_List.walltime = 960:00:00 ??? Variable_List = PBS_O_QUEUE=Long,PBS_O_HOME=/home/mahmood, ??????? ... ??? etime = Mon Sep 12 19:55:29 2011 ??? submit_args = tor ??? fault_tolerant = False mahmood at srv1:~$ checkjob -v 49153 checking job 49153 (RM job '49153.srv1') State: Idle Creds:? user:mahmood? group:mahmood? class:Long? qos:DEFAULT WallTime: 00:00:00 of 40:00:00:00 SubmitTime: Mon Sep 12 19:55:29 ? (Time Queued? Total: 00:39:24? Eligible: 00:39:24) Total Tasks: 1 Req[0]? TaskCount: 1? Partition: ALL Network: [NONE]? Memory >= 0? Disk >= 0? Swap >= 0 Opsys: [NONE]? Arch: [NONE]? Features: [NONE] Exec:? ''? ExecSize: 0? ImageSize: 0 Dedicated Resources Per Task: PROCS: 1 NodeAccess: SHARED NodeCount: 0 IWD: [NONE]? Executable:? [NONE] Bypass: 3? StartCount: 0 PartitionMask: [ALL] Flags:?????? HOSTLIST RESTARTABLE HostList: ? [node2:1] PE:? 1.00? StartPriority:? 147 job can run in partition DEFAULT (8 procs available.? 1 procs required) >which you seem to have manually selected in your qsub statement Yes, As you can see I requested node2 Resource_List.nodes = node2 and the output of "pbsnodes -l all" shows that this node is free mahmood at srv1:~$ pbsnodes -l all srv1????????????????? job-exclusive node2???????????????? free node3???????????????? job-exclusive node4???????????????? free Any idea about that? // Naderan *Mahmood; ----- Original Message ----- From: Steve Crusan To: Mahmood Naderan Cc: maui Sent: Monday, September 12, 2011 6:17 PM Subject: Re: [Mauiusers] Job is in 'Q' but checkjob shows it is running (!) -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sep 12, 2011, at 5:01 AM, Mahmood Naderan wrote: > > > Hi, > I sent this email to torque mailing list but seems that it is related to maui. So I restate the problem here. > > Can someone explain why the qstat shows a job in "Q" but checkjob says everything is normal? Looking below, the job is queued in TORQUE, and idle in Maui (not running), so everything is normal. Do you mean why isn't the job running, even though it seems that it *should* be running? If so, I would say post the output of qstat -f for the job, and checkjob -v. This seems to be more or less a scheduler configuration, or possibly an issue with the node (which you seem to have manually selected in your qsub statement). > > mahmood at srv1:416.gamess$ qstat 49003 > Job id? ? ? ? ? ? ? ? ? ? Name? ? ? ? ? ? User? ? ? ? ? ? Time Use S Queue > ------------------------- ---------------- --------------- -------- - ----- > 49003.srv1? ? ? ? ? ? ? ? gamess? ? ? ? ? mahmood? ? ? ? ? ? ? ? 0 Q Long > > > mahmood at srv1:416.gamess$ checkjob 49003 > checking job 49003 > > State: Idle > Creds:? user:mahmood? group:mahmood? class:Long? ? qos:DEFAULT > WallTime: 00:00:00 of 40:00:00:00 > SubmitTime: Sun Sep 11 09:51:26 >? (Time Queued? Total: 00:02:36? Eligible: 00:02:36) > > Total Tasks: 1 > > Req[0]? TaskCount: 1? Partition: ALL > Network: [NONE]? Memory >= 0? Disk >= 0? Swap >= 0 > Opsys: [NONE]? Arch: [NONE]? Features: [NONE] > > > IWD: [NONE]? Executable:? [NONE] > Bypass: 0? StartCount: 0 > PartitionMask: [ALL] > Flags:? ? ? HOSTLIST RESTARTABLE > HostList: >? [hawk:1] > PE:? 1.00? StartPriority:? 129 > job can run in partition DEFAULT (3 procs available.? 1 procs required) > > Thanks > // Naderan *Mahmood; > > _______________________________________________ > mauiusers mailing list > mauiusers at supercluster.org > http://www.supercluster.org/mailman/listinfo/mauiusers ---------------------- Steve Crusan System Administrator Center for Research Computing University of Rochester https://www.crc.rochester.edu/ -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.17 (Darwin) Comment: GPGTools - http://gpgtools.org iQEcBAEBAgAGBQJObg2IAAoJENS19LGOpgqKAnIIAKHvbLmV9Hs31IZ4AGHIOFG9 Wxp+qiXOnIMoKQQjhkkou1zVC4OKHnymcE/LxtiQcAuX+Lu8gd/GAR1tF5FeCF4g m7go12yb5Dx97sHgl2SjmRY3duDkx6YMfOGgxCuiN+O5SdkUazuW8GPkW+HPPS7/ T3gDbG0jizZ6A5LzhJqgPyVC4LKkwYt5v9NQBs/f82ZOGqPusEWdJ4N5oaUYhyG/ OXSj/xmzMTCYCqfdOUZynq4ACQotRbNmY7wrV+Uc0qWUFtZv/RIwQ/O4P261E/1/ dfrVX3OEdz9FBy4uoNrgMyNxL2eOanNiKSlhHJnoM04zx0SkAYGDOeGPqYv/vi0= =QcC7 -----END PGP SIGNATURE----- From scrusan at ur.rochester.edu Mon Sep 12 12:01:03 2011 From: scrusan at ur.rochester.edu (Steve Crusan) Date: Mon, 12 Sep 2011 14:01:03 -0400 Subject: [Mauiusers] Job is in 'Q' but checkjob shows it is running (!) In-Reply-To: <1315844878.5007.YahooMailNeo@web111725.mail.gq1.yahoo.com> References: <1315719989.32315.YahooMailNeo@web111717.mail.gq1.yahoo.com> <1315818090.92454.YahooMailNeo@web111721.mail.gq1.yahoo.com> <1315844878.5007.YahooMailNeo@web111725.mail.gq1.yahoo.com> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sep 12, 2011, at 12:27 PM, Mahmood Naderan wrote: >> Do you mean why isn't the job running, even though it seems that it *should* be running? > > Exactly... > >> If so, I would say post the output of qstat -f for the job, and checkjob -v > > mahmood at srv1:~$ qstat -f 49153 > Job Id: 49153.srv1 > Job_Name = bwaves > Job_Owner = mahmood at srv1 > job_state = Q > queue = Long > server = srv1 > Checkpoint = u > ctime = Mon Sep 12 19:55:29 2011 > Error_Path = srv1:/home/mahmood/multi2sim-3.0.3/410.bwave > s/bwaves.e49153 > Hold_Types = n > Join_Path = oe > Keep_Files = n > Mail_Points = a > mtime = Mon Sep 12 19:55:29 2011 > Output_Path = srv1:/home/mahmood/multi2sim-3.0.3/410.bwav > es/bwaves_128.out > Priority = 0 > qtime = Mon Sep 12 19:55:29 2011 > Rerunable = True > Resource_List.nodect = 1 > Resource_List.nodes = node2 > Resource_List.walltime = 960:00:00 > Variable_List = PBS_O_QUEUE=Long,PBS_O_HOME=/home/mahmood, > ... > etime = Mon Sep 12 19:55:29 2011 > submit_args = tor > fault_tolerant = False > > mahmood at srv1:~$ checkjob -v 49153 > checking job 49153 (RM job '49153.srv1') > > State: Idle > Creds: user:mahmood group:mahmood class:Long qos:DEFAULT > WallTime: 00:00:00 of 40:00:00:00 > SubmitTime: Mon Sep 12 19:55:29 > (Time Queued Total: 00:39:24 Eligible: 00:39:24) > > Total Tasks: 1 > > Req[0] TaskCount: 1 Partition: ALL > Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0 > Opsys: [NONE] Arch: [NONE] Features: [NONE] > Exec: '' ExecSize: 0 ImageSize: 0 > Dedicated Resources Per Task: PROCS: 1 > NodeAccess: SHARED > NodeCount: 0 > > > IWD: [NONE] Executable: [NONE] > Bypass: 3 StartCount: 0 > PartitionMask: [ALL] > Flags: HOSTLIST RESTARTABLE > HostList: > [node2:1] > PE: 1.00 StartPriority: 147 > job can run in partition DEFAULT (8 procs available. 1 procs required) There has got to be a reason why the job won't start even resources are available. I was hoping that checkjob -v would show the node information, but maybe it's different for maui. Can you run a checkjob -v -n The specific node itself seems to be having problems, or maui is not starting it. Do you see anything relevant in your /var/spool/maui/logs/maui.log file? If not, I would increase the verbosity of the logging, and restart the maui service. > > >> which you seem to have manually selected in your qsub statement > > Yes, As you can see I requested node2 > Resource_List.nodes = node2 > > and the output of "pbsnodes -l all" shows that this node is free > > mahmood at srv1:~$ pbsnodes -l all > srv1 job-exclusive > node2 free > node3 job-exclusive > node4 free > > > Any idea about that? > > // Naderan *Mahmood; > > > ----- Original Message ----- > From: Steve Crusan > To: Mahmood Naderan > Cc: maui > Sent: Monday, September 12, 2011 6:17 PM > Subject: Re: [Mauiusers] Job is in 'Q' but checkjob shows it is running (!) > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > > On Sep 12, 2011, at 5:01 AM, Mahmood Naderan wrote: > >> >> >> Hi, >> I sent this email to torque mailing list but seems that it is related to maui. So I restate the problem here. >> >> Can someone explain why the qstat shows a job in "Q" but checkjob says everything is normal? > > > Looking below, the job is queued in TORQUE, and idle in Maui (not running), so everything is normal. > > Do you mean why isn't the job running, even though it seems that it *should* be running? > > If so, I would say post the output of qstat -f for the job, and checkjob -v. This seems to be more or less a scheduler configuration, or possibly an issue with the node (which you seem to have manually selected in your qsub statement). > > > >> >> mahmood at srv1:416.gamess$ qstat 49003 >> Job id Name User Time Use S Queue >> ------------------------- ---------------- --------------- -------- - ----- >> 49003.srv1 gamess mahmood 0 Q Long >> >> >> mahmood at srv1:416.gamess$ checkjob 49003 >> checking job 49003 >> >> State: Idle >> Creds: user:mahmood group:mahmood class:Long qos:DEFAULT >> WallTime: 00:00:00 of 40:00:00:00 >> SubmitTime: Sun Sep 11 09:51:26 >> (Time Queued Total: 00:02:36 Eligible: 00:02:36) >> >> Total Tasks: 1 >> >> Req[0] TaskCount: 1 Partition: ALL >> Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0 >> Opsys: [NONE] Arch: [NONE] Features: [NONE] >> >> >> IWD: [NONE] Executable: [NONE] >> Bypass: 0 StartCount: 0 >> PartitionMask: [ALL] >> Flags: HOSTLIST RESTARTABLE >> HostList: >> [hawk:1] >> PE: 1.00 StartPriority: 129 >> job can run in partition DEFAULT (3 procs available. 1 procs required) >> >> Thanks >> // Naderan *Mahmood; >> >> _______________________________________________ >> mauiusers mailing list >> mauiusers at supercluster.org >> http://www.supercluster.org/mailman/listinfo/mauiusers > > ---------------------- > Steve Crusan > System Administrator > Center for Research Computing > University of Rochester > https://www.crc.rochester.edu/ > > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG/MacGPG2 v2.0.17 (Darwin) > Comment: GPGTools - http://gpgtools.org > > iQEcBAEBAgAGBQJObg2IAAoJENS19LGOpgqKAnIIAKHvbLmV9Hs31IZ4AGHIOFG9 > Wxp+qiXOnIMoKQQjhkkou1zVC4OKHnymcE/LxtiQcAuX+Lu8gd/GAR1tF5FeCF4g > m7go12yb5Dx97sHgl2SjmRY3duDkx6YMfOGgxCuiN+O5SdkUazuW8GPkW+HPPS7/ > T3gDbG0jizZ6A5LzhJqgPyVC4LKkwYt5v9NQBs/f82ZOGqPusEWdJ4N5oaUYhyG/ > OXSj/xmzMTCYCqfdOUZynq4ACQotRbNmY7wrV+Uc0qWUFtZv/RIwQ/O4P261E/1/ > dfrVX3OEdz9FBy4uoNrgMyNxL2eOanNiKSlhHJnoM04zx0SkAYGDOeGPqYv/vi0= > =QcC7 > -----END PGP SIGNATURE----- > ---------------------- Steve Crusan System Administrator Center for Research Computing University of Rochester https://www.crc.rochester.edu/ -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.17 (Darwin) Comment: GPGTools - http://gpgtools.org iQEcBAEBAgAGBQJObkjnAAoJENS19LGOpgqKwwQH/26RwQZX1BG/M3V/PztkOpPs CwshkSuBkQGrNqshY6/BenrZpXHGgEYGbqYyFm29NWMyNQ1Vm33mfb0rq84DBkXk gbME5qwg3uKeATUGuBQoMxdy/JEu1TdqDx4FNwLh8/wLxzhmJcQqatEX4qvEgJWP oT3m0j29rgENLfVKpZ40P7vHAPafJrnTAQjPsqmoZLnkK0dGOD/zD5T/RiMBKLar harduBX6s9FpKeHJTwEYGqBdMgxu1nBQ3wna+Tmmjq5HXxdlzlT7HfQSYzWQxtI2 kXU/1S6kaz1AXVUCsJt42MGbmWhAwCBbVP5RCfHvXB6pulMXyOinRDeoYNzc7HU= =eijX -----END PGP SIGNATURE----- From nt_mahmood at yahoo.com Mon Sep 12 13:22:06 2011 From: nt_mahmood at yahoo.com (Mahmood Naderan) Date: Mon, 12 Sep 2011 12:22:06 -0700 (PDT) Subject: [Mauiusers] Job is in 'Q' but checkjob shows it is running (!) In-Reply-To: References: <1315719989.32315.YahooMailNeo@web111717.mail.gq1.yahoo.com> <1315818090.92454.YahooMailNeo@web111721.mail.gq1.yahoo.com> <1315844878.5007.YahooMailNeo@web111725.mail.gq1.yahoo.com> Message-ID: <1315855326.70395.YahooMailNeo@web111715.mail.gq1.yahoo.com> Since the node is up for about 110? days, I think there may be a problem with maui service. With a restart it is now fine. Thanks for your help ? // Naderan *Mahmood; ----- Original Message ----- From: Steve Crusan To: Mahmood Naderan Cc: maui Sent: Monday, September 12, 2011 10:31 PM Subject: Re: [Mauiusers] Job is in 'Q' but checkjob shows it is running (!) -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sep 12, 2011, at 12:27 PM, Mahmood Naderan wrote: >> Do you mean why isn't the job running, even though it seems that it *should* be running? > > Exactly... > >> If so, I would say post the output of qstat -f for the job, and checkjob -v > > mahmood at srv1:~$ qstat -f 49153 > Job Id: 49153.srv1 >? ? Job_Name = bwaves >? ? Job_Owner = mahmood at srv1 >? ? job_state = Q >? ? queue = Long >? ? server = srv1 >? ? Checkpoint = u >? ? ctime = Mon Sep 12 19:55:29 2011 >? ? Error_Path = srv1:/home/mahmood/multi2sim-3.0.3/410.bwave >? ? ? ? s/bwaves.e49153 >? ? Hold_Types = n >? ? Join_Path = oe >? ? Keep_Files = n >? ? Mail_Points = a >? ? mtime = Mon Sep 12 19:55:29 2011 >? ? Output_Path = srv1:/home/mahmood/multi2sim-3.0.3/410.bwav >? ? ? ? es/bwaves_128.out >? ? Priority = 0 >? ? qtime = Mon Sep 12 19:55:29 2011 >? ? Rerunable = True >? ? Resource_List.nodect = 1 >? ? Resource_List.nodes = node2 >? ? Resource_List.walltime = 960:00:00 >? ? Variable_List = PBS_O_QUEUE=Long,PBS_O_HOME=/home/mahmood, >? ? ? ? ... >? ? etime = Mon Sep 12 19:55:29 2011 >? ? submit_args = tor >? ? fault_tolerant = False > > mahmood at srv1:~$ checkjob -v 49153 > checking job 49153 (RM job '49153.srv1') > > State: Idle > Creds:? user:mahmood? group:mahmood? class:Long? qos:DEFAULT > WallTime: 00:00:00 of 40:00:00:00 > SubmitTime: Mon Sep 12 19:55:29 >? (Time Queued? Total: 00:39:24? Eligible: 00:39:24) > > Total Tasks: 1 > > Req[0]? TaskCount: 1? Partition: ALL > Network: [NONE]? Memory >= 0? Disk >= 0? Swap >= 0 > Opsys: [NONE]? Arch: [NONE]? Features: [NONE] > Exec:? ''? ExecSize: 0? ImageSize: 0 > Dedicated Resources Per Task: PROCS: 1 > NodeAccess: SHARED > NodeCount: 0 > > > IWD: [NONE]? Executable:? [NONE] > Bypass: 3? StartCount: 0 > PartitionMask: [ALL] > Flags:? ? ? HOSTLIST RESTARTABLE > HostList: >? [node2:1] > PE:? 1.00? StartPriority:? 147 > job can run in partition DEFAULT (8 procs available.? 1 procs required) There has got to be a reason why the job won't start even resources are available. I was hoping that checkjob -v would show the node information, but maybe it's different for maui. Can you run a checkjob -v -n The specific node itself seems to be having problems, or maui is not starting it. Do you see anything relevant in your /var/spool/maui/logs/maui.log file? If not, I would increase the verbosity of the logging, and restart the maui service. > > >> which you seem to have manually selected in your qsub statement > > Yes, As you can see I requested node2 > Resource_List.nodes = node2 > > and the output of "pbsnodes -l all" shows that this node is free > > mahmood at srv1:~$ pbsnodes -l all > srv1? ? ? ? ? ? ? ? ? job-exclusive > node2? ? ? ? ? ? ? ? free > node3? ? ? ? ? ? ? ? job-exclusive > node4? ? ? ? ? ? ? ? free > > > Any idea about that? > > // Naderan *Mahmood; > > > ----- Original Message ----- > From: Steve Crusan > To: Mahmood Naderan > Cc: maui > Sent: Monday, September 12, 2011 6:17 PM > Subject: Re: [Mauiusers] Job is in 'Q' but checkjob shows it is running (!) > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > > On Sep 12, 2011, at 5:01 AM, Mahmood Naderan wrote: > >> >> >> Hi, >> I sent this email to torque mailing list but seems that it is related to maui. So I restate the problem here. >> >> Can someone explain why the qstat shows a job in "Q" but checkjob says everything is normal? > > > Looking below, the job is queued in TORQUE, and idle in Maui (not running), so everything is normal. > > Do you mean why isn't the job running, even though it seems that it *should* be running? > > If so, I would say post the output of qstat -f for the job, and checkjob -v. This seems to be more or less a scheduler configuration, or possibly an issue with the node (which you seem to have manually selected in your qsub statement). > > > >> >> mahmood at srv1:416.gamess$ qstat 49003 >> Job id? ? ? ? ? ? ? ? ? ? Name? ? ? ? ? ? User? ? ? ? ? ? Time Use S Queue >> ------------------------- ---------------- --------------- -------- - ----- >> 49003.srv1? ? ? ? ? ? ? ? gamess? ? ? ? ? mahmood? ? ? ? ? ? ? ? 0 Q Long >> >> >> mahmood at srv1:416.gamess$ checkjob 49003 >> checking job 49003 >> >> State: Idle >> Creds:? user:mahmood? group:mahmood? class:Long? ? qos:DEFAULT >> WallTime: 00:00:00 of 40:00:00:00 >> SubmitTime: Sun Sep 11 09:51:26 >>? ? (Time Queued? Total: 00:02:36? Eligible: 00:02:36) >> >> Total Tasks: 1 >> >> Req[0]? TaskCount: 1? Partition: ALL >> Network: [NONE]? Memory >= 0? Disk >= 0? Swap >= 0 >> Opsys: [NONE]? Arch: [NONE]? Features: [NONE] >> >> >> IWD: [NONE]? Executable:? [NONE] >> Bypass: 0? StartCount: 0 >> PartitionMask: [ALL] >> Flags:? ? ? HOSTLIST RESTARTABLE >> HostList: >>? ? [hawk:1] >> PE:? 1.00? StartPriority:? 129 >> job can run in partition DEFAULT (3 procs available.? 1 procs required) >> >> Thanks >> // Naderan *Mahmood; >> >> _______________________________________________ >> mauiusers mailing list >> mauiusers at supercluster.org >> http://www.supercluster.org/mailman/listinfo/mauiusers > > ---------------------- > Steve Crusan > System Administrator > Center for Research Computing > University of Rochester > https://www.crc.rochester.edu/ > > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG/MacGPG2 v2.0.17 (Darwin) > Comment: GPGTools - http://gpgtools.org > > iQEcBAEBAgAGBQJObg2IAAoJENS19LGOpgqKAnIIAKHvbLmV9Hs31IZ4AGHIOFG9 > Wxp+qiXOnIMoKQQjhkkou1zVC4OKHnymcE/LxtiQcAuX+Lu8gd/GAR1tF5FeCF4g > m7go12yb5Dx97sHgl2SjmRY3duDkx6YMfOGgxCuiN+O5SdkUazuW8GPkW+HPPS7/ > T3gDbG0jizZ6A5LzhJqgPyVC4LKkwYt5v9NQBs/f82ZOGqPusEWdJ4N5oaUYhyG/ > OXSj/xmzMTCYCqfdOUZynq4ACQotRbNmY7wrV+Uc0qWUFtZv/RIwQ/O4P261E/1/ > dfrVX3OEdz9FBy4uoNrgMyNxL2eOanNiKSlhHJnoM04zx0SkAYGDOeGPqYv/vi0= > =QcC7 > -----END PGP SIGNATURE----- > ---------------------- Steve Crusan System Administrator Center for Research Computing University of Rochester https://www.crc.rochester.edu/ -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.17 (Darwin) Comment: GPGTools - http://gpgtools.org iQEcBAEBAgAGBQJObkjnAAoJENS19LGOpgqKwwQH/26RwQZX1BG/M3V/PztkOpPs CwshkSuBkQGrNqshY6/BenrZpXHGgEYGbqYyFm29NWMyNQ1Vm33mfb0rq84DBkXk gbME5qwg3uKeATUGuBQoMxdy/JEu1TdqDx4FNwLh8/wLxzhmJcQqatEX4qvEgJWP oT3m0j29rgENLfVKpZ40P7vHAPafJrnTAQjPsqmoZLnkK0dGOD/zD5T/RiMBKLar harduBX6s9FpKeHJTwEYGqBdMgxu1nBQ3wna+Tmmjq5HXxdlzlT7HfQSYzWQxtI2 kXU/1S6kaz1AXVUCsJt42MGbmWhAwCBbVP5RCfHvXB6pulMXyOinRDeoYNzc7HU= =eijX -----END PGP SIGNATURE----- From arnaubria at pic.es Wed Sep 14 08:32:34 2011 From: arnaubria at pic.es (Arnau Bria) Date: Wed, 14 Sep 2011 16:32:34 +0200 Subject: [Mauiusers] maui stops scheduling when finds a non real busy node Message-ID: <20110914163234.714e966d@amarrosa.pic.es> Hi all, We have updated our torque version to 2.5.8 recently, but, as I see this is a maui issue, I first ask here. our combo is : # rpm -qa|egrep 'maui-server|torque-server' maui-server-3.3-1.x86_64 torque-server-2.5.8-1.cri.x86_64 Maui works fine, but in a schedule cycle, if it finds a node in busy status, it does not schedule any other job in that cyle: 09/14 16:20:48 INFO: job '20420500' successfully started 09/14 16:20:48 MRMJobStart(20420265,Msg,SC) 09/14 16:20:48 MPBSJobStart(20420265,base,Msg,SC) 09/14 16:20:48 ERROR: job '20420265' cannot be started: (rc: 15046 errmsg: 'Resource temporarily unavailable REJHOST=td578.pic.es MSG=cannot allocate node 'td578.pic.es' to job - node not currently available (state: busy)' hostlist: 'td578.pic.es') 09/14 16:20:48 ERROR: cannot start job '20420265' in partition DEFAULT 09/14 16:20:48 MJobPReserve(20420265,DEFAULT,ResCount,ResCountRej) 09/14 16:20:48 INFO: no priority reservations created (bf/rsv policy) 09/14 16:20:48 MRMJobStart(20420306,Msg,SC) 09/14 16:20:48 MPBSJobStart(20420306,base,Msg,SC) 09/14 16:20:48 ERROR: job '20420306' cannot be started: (rc: 15046 errmsg: 'Resource temporarily unavailable REJHOST=td578.pic.es MSG=cannot allocate node 'td578.pic.es' to job - node not currently available (state: busy)' hostlist: 'td578.pic.es') 09/14 16:20:48 ERROR: cannot start job '20420306' in partition DEFAULT 09/14 16:20:48 MJobPReserve(20420306,DEFAULT,ResCount,ResCountRej) 09/14 16:20:48 INFO: no priority reservations created (bf/rsv policy) 09/14 16:20:48 MRMJobStart(20420268,Msg,SC) torque says that the node is busy: 09/14/2011 03:00:13;0008;PBS_Server;Job;20401629.pbs03.pic.es;could not locate requested resources 'td578.pic.es' (node_spec failed) cannot allocate node 'td578.pic.es' to job - node not currently available (state: busy) 09/14/2011 03:00:13;0080;PBS_Server;Req;req_reject;Reject reply code=15046(Resource temporarily unavailable REJHOST=td578.pic.es MSG=cannot allocate node 'td578.pic.es' to job - node not currently available (state: busy)), aux=0, type=RunJob, from root at pbs03.pic.es 09/14/2011 03:00:13;0008;PBS_Server;Job;20401630.pbs03.pic.es;could not locate requested resources 'td578.pic.es' (node_spec failed) cannot allocate node 'td578.pic.es' to job - node not currently available (state: busy) but that the node is not real "busy". It's only busy for few seconds becasue, after I see the error (delay of 3-4 seconds), I do a pbsnodes $nodename and I see it free. On the next scheduling cycle, if it does not find any "busy" node, all jobs are scheduled. I'm wondering if I could configure maui to bypass those failing nodes and keep scheduling other jobs while I guess why torque mark those nodes as busy if they are not. TIA, Arnau From laotsao at gmail.com Wed Sep 14 08:39:57 2011 From: laotsao at gmail.com (=?UTF-8?B?Ikh1bmctU2hlbmcgVHNhbyAoTGFvIFRzYW8g6ICB5pu5KSBQaC5ELiI=?=) Date: Wed, 14 Sep 2011 10:39:57 -0400 Subject: [Mauiusers] maui stops scheduling when finds a non real busy node In-Reply-To: <20110914163234.714e966d@amarrosa.pic.es> References: <20110914163234.714e966d@amarrosa.pic.es> Message-ID: <4E70BCBD.6040107@gmail.com> please post your configuration file of maui and your torque setup On 9/14/2011 10:32 AM, Arnau Bria wrote: > Hi all, > > We have updated our torque version to 2.5.8 recently, but, as I see > this is a maui issue, I first ask here. > > our combo is : > # rpm -qa|egrep 'maui-server|torque-server' > maui-server-3.3-1.x86_64 > torque-server-2.5.8-1.cri.x86_64 > > Maui works fine, but in a schedule cycle, if it finds a node in busy > status, it does not schedule any other job in that cyle: > > 09/14 16:20:48 INFO: job '20420500' successfully started > 09/14 16:20:48 MRMJobStart(20420265,Msg,SC) > 09/14 16:20:48 MPBSJobStart(20420265,base,Msg,SC) > 09/14 16:20:48 ERROR: job '20420265' cannot be started: (rc: 15046 errmsg: 'Resource temporarily unavailable REJHOST=td578.pic.es MSG=cannot allocate node 'td578.pic.es' to job - node not currently available (state: busy)' hostlist: 'td578.pic.es') > 09/14 16:20:48 ERROR: cannot start job '20420265' in partition DEFAULT > 09/14 16:20:48 MJobPReserve(20420265,DEFAULT,ResCount,ResCountRej) > 09/14 16:20:48 INFO: no priority reservations created (bf/rsv policy) > 09/14 16:20:48 MRMJobStart(20420306,Msg,SC) > 09/14 16:20:48 MPBSJobStart(20420306,base,Msg,SC) > 09/14 16:20:48 ERROR: job '20420306' cannot be started: (rc: 15046 errmsg: 'Resource temporarily unavailable REJHOST=td578.pic.es MSG=cannot allocate node 'td578.pic.es' to job - node not currently available (state: busy)' hostlist: 'td578.pic.es') > 09/14 16:20:48 ERROR: cannot start job '20420306' in partition DEFAULT > 09/14 16:20:48 MJobPReserve(20420306,DEFAULT,ResCount,ResCountRej) > 09/14 16:20:48 INFO: no priority reservations created (bf/rsv policy) > 09/14 16:20:48 MRMJobStart(20420268,Msg,SC) > > > > torque says that the node is busy: > > 09/14/2011 03:00:13;0008;PBS_Server;Job;20401629.pbs03.pic.es;could not locate requested resources 'td578.pic.es' (node_spec failed) cannot allocate node 'td578.pic.es' to job - node not currently available (state: busy) > 09/14/2011 03:00:13;0080;PBS_Server;Req;req_reject;Reject reply code=15046(Resource temporarily unavailable REJHOST=td578.pic.es MSG=cannot allocate node 'td578.pic.es' to job - node not currently available (state: busy)), aux=0, type=RunJob, from root at pbs03.pic.es > 09/14/2011 03:00:13;0008;PBS_Server;Job;20401630.pbs03.pic.es;could not locate requested resources 'td578.pic.es' (node_spec failed) cannot allocate node 'td578.pic.es' to job - node not currently available (state: busy) > > > but that the node is not real "busy". It's only busy for > few seconds becasue, after I see the error (delay of 3-4 seconds), I do a > pbsnodes $nodename and I see it free. > On the next scheduling cycle, if it does not find any "busy" node, all jobs are scheduled. > > > I'm wondering if I could configure maui to bypass those failing nodes > and keep scheduling other jobs while I guess why torque mark those > nodes as busy if they are not. > > > TIA, > Arnau > _______________________________________________ > mauiusers mailing list > mauiusers at supercluster.org > http://www.supercluster.org/mailman/listinfo/mauiusers -------------- next part -------------- A non-text attachment was scrubbed... Name: laotsao.vcf Type: text/x-vcard Size: 642 bytes Desc: not available Url : http://www.supercluster.org/pipermail/mauiusers/attachments/20110914/9194e18a/attachment-0001.vcf From arnaubria at pic.es Wed Sep 14 09:13:47 2011 From: arnaubria at pic.es (Arnau Bria) Date: Wed, 14 Sep 2011 17:13:47 +0200 Subject: [Mauiusers] maui stops scheduling when finds a non real busy node In-Reply-To: <4E70BCBD.6040107@gmail.com> References: <20110914163234.714e966d@amarrosa.pic.es> <4E70BCBD.6040107@gmail.com> Message-ID: <20110914171347.73472e97@amarrosa.pic.es> On Wed, 14 Sep 2011 10:39:57 -0400 Hung-Sheng Tsao (Lao Tsao ??) Ph.D. wrote: > please post your configuration file of maui and your torque setup find them attached. thanks for your reply, Arnau -------------- next part -------------- A non-text attachment was scrubbed... Name: maui.cfg Type: application/octet-stream Size: 9887 bytes Desc: not available Url : http://www.supercluster.org/pipermail/mauiusers/attachments/20110914/67fe48f1/attachment.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: torque.conf Type: application/octet-stream Size: 3314 bytes Desc: not available Url : http://www.supercluster.org/pipermail/mauiusers/attachments/20110914/67fe48f1/attachment-0001.obj From akshar.bhosale at gmail.com Sat Sep 3 12:56:52 2011 From: akshar.bhosale at gmail.com (akshar bhosale) Date: Sun, 4 Sep 2011 00:26:52 +0530 Subject: [Mauiusers] Maui Reservation Message-ID: Hi, We are using maui 2.3.6. We want to have standing reservation of depth 210. We have done following setting in maui.cfg we want to have standing res for next 3 months. ################## RESDEPTH 350 SRCFG[res] STARTTIME=12:00:00 ENDTIME=16:00:00 SRCFG[res] USERLIST=kunal SRCFG[res] DEPTH=220 SRCFG[res] HOSTLIST=node4.compute.clust,node5.compute.clust,node6.compute.clust,node7.compute.clust ################## but we get res1.1.0 User - 10:49:41 16:49:41 6:00:00 3/48 Sat sep 5 12:00:00 . . . res1.63.0 User - 62:10:49:41 62:16:49:41 6:00:00 3/48 Sun 6 12:00:00 i.e. it is allowing only 64 times. how to make it 90 times or more? -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.supercluster.org/pipermail/mauiusers/attachments/20110904/57ea1477/attachment-0001.html From mauiuser2011 at gmail.com Wed Sep 7 09:46:15 2011 From: mauiuser2011 at gmail.com (Maui User) Date: Wed, 7 Sep 2011 11:46:15 -0400 Subject: [Mauiusers] Help setting up a fairshare policy Message-ID: Hi. I would like to set up a queuing policy, and would appreciate advice on how close I can get to it with maui fairshare. Pointers to the configuration options I should use would be very helpful, too. Here is the policy I would ideally like to implement: 1) There are currently several groups of users who have dedicated nodes. On these dedicated nodes, idle jobs from the users in the relevant groups should be prioritized to move the cluster towards an equal number of nodes per current user. For instance, suppose there are only 10 dedicated nodes available, and user1 has jobs on all of them, and 100 jobs idle in the queue. If user2 in the same group submits 20 jobs, the next five job assignments for those nodes should go to user 2, so that the load is balanced evenly between users. 2) There is also a set of "free" nodes. A user's jobs are first submitted to the dedicated nodes for their group, then to any available free nodes. I would like job assignments to the free nodes to be prioritized so that the number of nodes assigned to each active group moves toward a certain ratio (say even distribution between groups for now, for simplicity), and the number of nodes assigned to each user within a group moves towards an equal distribution. Note that I'm happy for the queue to "overcommit" based on the current set of active users. That is, if there's only one user, they can take all the nodes at the moment. But as more active users show up, I would users with more nodes than they should have according to the above principles to be deprioritized in subsequent job assignments. Is this too complicated? Sincerely, A Maui User. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.supercluster.org/pipermail/mauiusers/attachments/20110907/d3ff2099/attachment-0001.html From wl_phy at 163.com Fri Sep 16 08:00:26 2011 From: wl_phy at 163.com (L.Wu(Lei.Wu)) Date: Fri, 16 Sep 2011 22:00:26 +0800 (CST) Subject: [Mauiusers] Torque+Maui unable to distribute parallel job to different nodes Message-ID: <475799.e122.1327289cdc1.Coremail.wl_phy@163.com> I've just installed torque and maui on a HP blade system. we have 16 nodes, each has 2 xeon e5620 processors. Both serial and parallel jobs within a single node can be successfully submited and run perfectly. However, if I set #PBS -l nodes=X(X larger than 1):ppn I can see the job in R status with qstat command, but it is not running acctually. After canceling the job, I get following error message: [mpiexec at node1] HYD_pmcd_pmiserv_send_signal (./pm/mpiserv/mpiserv_cb.c:184): assert (!closed) failed [mpiexec at node1] ui_cmd_cb (./pm/pmiserv/pmiserv_pmci.c:74): unable to send SIGUSR1 downstream [mpiexec at node1] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status [mpiexec at node1] HYD_pmci_wait_for_completion (./pm/pmserv/pmiserv_pmci.c:179): error waiting for event [mpiexec at node1] main (./ui/mpich/mpiexec.c:397): process manager error waiting for completion I've also found that the $PBS_NODEFILE(e.g. JOBID.node1 file in /var/spool/torque/aux ) exists only on the first node among the nodes assign for this jobs. Further more, If I replace the $PBS_NODEFILE with a local file containing computing nodes in PBS script, it works well and job can be run on all the nodes assigned: #!/bin/sh #PBS -N name #PBS -e errorfile #PBS -o outfile #PBS -q test #PBS -l nodes=2 cd $work_dir #mpiexec -f $PBS_NODEFILE ./executables.... mpiexec -f hosts ./executables.... hosts file: node1 node1 node1 node1 node2 node2 node2 node2 Interestingly, if I add :ppn=4 after the #PBS -l nodes=2, i.e. #PBS -l nodes=2:ppn=4 The PBS script fails again even if I use local host file. Can anyone help me? -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.supercluster.org/pipermail/mauiusers/attachments/20110916/e46614b1/attachment.html From decicco10 at gmail.com Fri Sep 16 09:30:48 2011 From: decicco10 at gmail.com (Marcelo De Cicco) Date: Fri, 16 Sep 2011 12:30:48 -0300 Subject: [Mauiusers] nodes going crazy Message-ID: hello!! Week ago, we installed infiniband, since then the nodes has been crazy: WARNING: active job '147' has inactive node n012 allocated for 1:20:52:21 (node state: 'Down') WARNING: active job '142' has inactive node n012 allocated for 1:22:50:19 (node state: 'Down') WARNING: active job '143' has inactive node n012 allocated for 1:22:48:38 (node state: 'Down') WARNING: active job '144' has inactive node n012 allocated for 1:22:47:24 (node state: 'Down') WARNING: active job '145' has inactive node n012 allocated for 1:22:45:41 (node state: 'Down') WARNING: active job '146' has inactive node n012 allocated for 1:22:44:26 (node state: 'Down') WARNING: active job '148' has inactive node n008 allocated for 1:03:19:34 (node state: 'Down') WARNING: active job '150' has inactive node n008 allocated for 2:42:46 (node state: 'Down') I restart the pbs_mom in the nodes, but nothing happens. And suddenly , the nodes that was down, rises again! Marcelo De Cicco ** "Antes de imprimir, pense no Meio Ambiente e nos Custos" * " THE MORE PROGRESS PHYSICAL SCIENCES MAKE, THE MORE THEY TEND TO ENTER THE DOMAIN OF MATHEMATICS, WHICH IS A KIND OF CENTRE TO WHICH THEY ALL CONVERGE. WE MAY EVEN JUDGE THE DEGREE OF PERFECTION TO WHICH A SCIENCE HAS ARRIVED BY THE FACILITY WITH WHICH IT MAY BE SUBMITTED TO CALCULATION" . -- ADOLPHE QUETELET, 1796-1874 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.supercluster.org/pipermail/mauiusers/attachments/20110916/522e54f8/attachment.html From jayavant.patil82 at gmail.com Sun Sep 18 23:14:18 2011 From: jayavant.patil82 at gmail.com (Jayavant Patil) Date: Mon, 19 Sep 2011 10:44:18 +0530 Subject: [Mauiusers] Maui Reservation (akshar bhosale) Message-ID: Hi akshar, If you see the in the code of Maui i.e. in file include/msched.h, the Standing Reservation DEPTH is defined as follows: #define MAX_SRES_DEPTH 64 So, I think because of this it is creating only 64 reservations. If you want 90 reservations, you need to change the code. -- Thanks & Regards, Jayavant N. Patil, M.Tech-II (CS), CoEP, Pune. Mob. No.: +91 9923536030. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.supercluster.org/pipermail/mauiusers/attachments/20110919/0ad6aba0/attachment.html From jayavant.patil82 at gmail.com Mon Sep 19 05:22:08 2011 From: jayavant.patil82 at gmail.com (Jayavant Patil) Date: Mon, 19 Sep 2011 16:52:08 +0530 Subject: [Mauiusers] Automatic Job REQUEUE Message-ID: Hi, I am using TORQUE 3.0.0 and Maui 3.3. When the compute node (on which the job is running) fails due to crash, power failure or any other reason, how the job which was running on that compute node should get *automatically * requeued? (I am aware that with *qrerun* we can manually rerun the job but I don't want this) Thanks in advance. -- Thanks & Regards, Jayavant N. Patil -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.supercluster.org/pipermail/mauiusers/attachments/20110919/839bfc56/attachment.html From vlad at cosy.sbg.ac.at Mon Sep 19 06:06:22 2011 From: vlad at cosy.sbg.ac.at (Vlad Popa) Date: Mon, 19 Sep 2011 14:06:22 +0200 Subject: [Mauiusers] NODETYPE Paramater Message-ID: <4E77303E.8020205@cosy.sbg.ac.at> Hi! Since I'm very new to maui, I just wanted to ask, whether I could define multiple parameters to "NODETYPE" , separated by "," (i.e NODETYPE=I7,BIGMEM, GPU) when specifying nodes via NODECFG. Greetings from Europe/Austria/Salzburg Vlad Popa University of Salzburg Computer Science 5020 Salzburg Austria Europe From jayavant.patil82 at gmail.com Tue Sep 20 00:15:56 2011 From: jayavant.patil82 at gmail.com (Jayavant Patil) Date: Tue, 20 Sep 2011 11:45:56 +0530 Subject: [Mauiusers] Setting Job FLAG as Suspendable Message-ID: Hi, How do I set a job flag as SUSPENDABLE through qsub command as well as through maui.cfg file. Thanks in advance. -- Thanks & Regards, Jayavant N. Patil -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.supercluster.org/pipermail/mauiusers/attachments/20110920/8eeb0322/attachment.html From brianm at usc.edu Tue Sep 20 23:03:34 2011 From: brianm at usc.edu (Brian Mendenhall) Date: Tue, 20 Sep 2011 22:03:34 -0700 Subject: [Mauiusers] Problem with Account Management and Maui 3.3.1 Message-ID: <6d50a326330.4e790db6@usc.edu> I have recently upgraded torque and maui on my cluster and have run into a weird issue that I am having a hard time understanding why I can't find any documentation on the subject. Maui: 3.3.1 (upgraded from 3.2.6) RM: Torque 2.4.16 (upgraded from 2.1.12) AM: Qbank: 2.11 (not upgraded) Previously, I had torque queues for users that are not charged time for running their jobs, as an example, lets say people that belong to the unix group 'lc_cmb' submit jobs to the 'cmb' queue using 'qsub -q cmb script.pbs'. The maui.cfg entry for cmb was: CLASSCFG[cmb] NOAM=TRUE JOBFLAGS=SHAREDIFSINGLENODE MAXJOB[USER]=300,420 and that basically turned off any charges for the cmb group. I know that qbank is working fine, and maui <-> interaction is fine because regular users that do get charged time work fine. Is there a different way of turning off account management charges for queues in maui/torque/qbank? The error in the maui.log file is pretty simple: 09/20 21:49:48 INFO: transaction sent to AM 09/20 21:49:48 ALERT: cannot create AM reservation for job '1324' (request refused) 09/20 21:49:48 ERROR: cannot start job '1324' in partition DEFAULT Awaiting 00000237 bytes -- Tue Sep 20 21:28:22 PDT 2011 REQUEST=COMMAND=make_reservation AUTH=maui MACHINE=hpc ACCOUNT=lc_cmb USER=sungjech WCLIMIT=86400 PROCCOUNT=16 QOS=DEFAULT CLASS=cmb_exe NODETYPE=DEFAULT TYPE=maui JOBID=1324 JOBTYPE=job NODES=1 REPLY=00000072 STATUSCODE=0 RESULT=sungjech does not have sufficient funds for job 1324 I find this weird: [root at hpc-pbs maui]$ showconfig | grep '\[cmb\]' CLASSCFG[cmb] DEFAULT.FEATURES=[cmb] [root at hpc-pbs maui]$ diagnose -v -c cmb Class/Queue Status Name Priority Flags QDef QOSList* PartitionList Target Limits cmb 0 [NONE] [NONE] [NONE] [NONE] 0.00 [NONE] DEFAULT.FEATURES=[cmb] MAXJOBPERUSER=420,300 So, the maxjob configuration is there, but no JOBFLAGS or NOAM configurations shown. ----- Brian Mendenhall Linux/HPCC Administrator University of Southern California From roy.dragseth at cc.uit.no Wed Sep 21 01:46:41 2011 From: roy.dragseth at cc.uit.no (Roy Dragseth) Date: Wed, 21 Sep 2011 09:46:41 +0200 Subject: [Mauiusers] Problem with Account Management and Maui 3.3.1 In-Reply-To: <6d50a326330.4e790db6@usc.edu> References: <6d50a326330.4e790db6@usc.edu> Message-ID: <201109210946.41484.roy.dragseth@cc.uit.no> We switched from qbank to gold a long time ago so I'm not sure if this applies to qbank. In gold you can set the chargerate to 0 for a specific QoS, maybe something similar is possible in qbank. r. On Wednesday, September 21, 2011 07:03:34 Brian Mendenhall wrote: > I have recently upgraded torque and maui on my cluster and have run into a > weird issue that I am having a hard time understanding why I can't find > any documentation on the subject. > > Maui: 3.3.1 (upgraded from 3.2.6) > RM: Torque 2.4.16 (upgraded from 2.1.12) > AM: Qbank: 2.11 (not upgraded) > > Previously, I had torque queues for users that are not charged time for > running their jobs, as an example, lets say people that belong to the unix > group 'lc_cmb' submit jobs to the 'cmb' queue using 'qsub -q cmb > script.pbs'. The maui.cfg entry for cmb was: > > CLASSCFG[cmb] NOAM=TRUE JOBFLAGS=SHAREDIFSINGLENODE > MAXJOB[USER]=300,420 > > and that basically turned off any charges for the cmb group. > > I know that qbank is working fine, and maui <-> interaction is fine because > regular users that do get charged time work fine. > > Is there a different way of turning off account management charges for > queues in maui/torque/qbank? > > The error in the maui.log file is pretty simple: > 09/20 21:49:48 INFO: transaction sent to AM > 09/20 21:49:48 ALERT: cannot create AM reservation for job '1324' > (request refused) 09/20 21:49:48 ERROR: cannot start job '1324' in > partition DEFAULT > > Awaiting 00000237 bytes -- Tue Sep 20 21:28:22 PDT 2011 > REQUEST=COMMAND=make_reservation AUTH=maui MACHINE=hpc ACCOUNT=lc_cmb > USER=sungjech WCLIMIT=86400 PROCCOUNT=16 QOS=DEFAULT CLASS=cmb_exe > NODETYPE=DEFAULT TYPE=maui JOBID=1324 JOBTYPE=job NODES=1 REPLY=00000072 > STATUSCODE=0 RESULT=sungjech does not have sufficient funds for job 1324 > > I find this weird: > [root at hpc-pbs maui]$ showconfig | grep '\[cmb\]' > CLASSCFG[cmb] DEFAULT.FEATURES=[cmb] > > [root at hpc-pbs maui]$ diagnose -v -c cmb > Class/Queue Status > > Name Priority Flags QDef QOSList* > PartitionList Target Limits > > cmb 0 [NONE] [NONE] [NONE] [NONE] > 0.00 [NONE] DEFAULT.FEATURES=[cmb] MAXJOBPERUSER=420,300 > > So, the maxjob configuration is there, but no JOBFLAGS or NOAM > configurations shown. > > ----- > Brian Mendenhall > Linux/HPCC Administrator > University of Southern California > _______________________________________________ > mauiusers mailing list > mauiusers at supercluster.org > http://www.supercluster.org/mailman/listinfo/mauiusers From jayavant.patil82 at gmail.com Wed Sep 21 01:52:41 2011 From: jayavant.patil82 at gmail.com (Jayavant Patil) Date: Wed, 21 Sep 2011 13:22:41 +0530 Subject: [Mauiusers] Creating Reservation with setres Message-ID: Hi, I am using TORQUE 3.0.0. and Maui 3.3. I have 3 nodes cluster with node names as n0,n2 ans n3. I want to create a user type reservation using setres on n0 and n3 only. I tried this with setres but it fails as shown below: setres -u root -d 01:00:00 'n[0,3]' reservation created reservation 'root.0' created on 3 nodes (24 tasks) n0:1 n2:1 n3:1 My question is why is it creating the reservation on n2 too? or How do I write a regular expression to create a reservation on non-contiguous nodes only?(as like n0 and n3 in this example) -- Thanks & Regards, Jayavant N. Patil -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.supercluster.org/pipermail/mauiusers/attachments/20110921/edb442e8/attachment.html From bunk at physik.hu-berlin.de Wed Sep 21 04:50:24 2011 From: bunk at physik.hu-berlin.de (Burkhard Bunk) Date: Wed, 21 Sep 2011 12:50:24 +0200 (CEST) Subject: [Mauiusers] Creating Reservation with setres In-Reply-To: References: Message-ID: Hi, the rules for listing hostnames in Maui are obscure. For examples see e.g. the following discussion: http://www.supercluster.org/pipermail/mauiusers/2010-October/004372.html http://www.supercluster.org/pipermail/mauiusers/2010-October/004373.html http://www.supercluster.org/pipermail/mauiusers/2010-November/004374.html Regards, Burkhard Bunk. ---------------------------------------------------------------------- bunk at physik.hu-berlin.de Physics Institute, Humboldt University fax: ++49-30 2093 7628 Newtonstr. 15 phone: ++49-30 2093 7980 12489 Berlin, Germany ---------------------------------------------------------------------- On Wed, 21 Sep 2011, Jayavant Patil wrote: > Hi, > > ?? I am using TORQUE 3.0.0. and Maui 3.3. I have 3 nodes cluster with node > names as n0,n2 ans n3. I want to create a user type reservation using setres > on n0 and n3 only. I tried this with setres but it fails as shown below: > > > setres -u root -d 01:00:00 'n[0,3]' > reservation created > > > reservation 'root.0' created on 3 nodes (24 tasks) > n0:1 > n2:1 > n3:1 > > My question is why is it creating the reservation on n2 too? or How do I write > a regular expression to create a reservation on non-contiguous nodes only?(as > like n0 and n3 in this example) > > -- > > Thanks & Regards, > Jayavant N. Patil > > From brianm at usc.edu Wed Sep 21 11:05:24 2011 From: brianm at usc.edu (Brian Mendenhall) Date: Wed, 21 Sep 2011 10:05:24 -0700 Subject: [Mauiusers] Problem with Account Management and Maui 3.3.1 In-Reply-To: <201109210946.41484.roy.dragseth@cc.uit.no> References: <6d50a326330.4e790db6@usc.edu> <201109210946.41484.roy.dragseth@cc.uit.no> Message-ID: <6d60d41c3429.4e79b6e4@usc.edu> Thank you for your reply Roy. I installed and tested Gold, but did not feel like it brought anything sufficient enough to justify losing all of our historical qbank data (the databases are not compatible, no real migration utilities exist, and I don't have the time to create them), but I guess I did miss this 'nocharge' issue as I was under the impression it was part of maui. I did, however, find an old patch for maui 3.2.6.p20 that implemented the 'NOAM' CLASSCFG feature, so this explains how it used to work. So now my question is: is it really just that you either have an Allocation Manager and everyone gets charged, or is there a different way to accomplish the same goal? I do see that NODECFG has a feature of 'CHARGERATE', but it would not be feasible for me to have 2700+ entries in my maui.cfg, unless I can use something like 'NODECFG[cmb] CHARGERATE=0.0' which would be applied to the nodeset feature 'cmb'. Going to try to test it without causing problems ... ----- Brian Mendenhall Linux/HPCC Administrator University of Southern California ----- Original Message ----- From: Roy Dragseth Date: Wednesday, September 21, 2011 12:46 am Subject: Re: [Mauiusers] Problem with Account Management and Maui 3.3.1 To: mauiusers at supercluster.org > We switched from qbank to gold a long time ago so I'm not sure if > this applies > to qbank. In gold you can set the chargerate to 0 for a specific > QoS, maybe > something similar is possible in qbank. > > r. > > > On Wednesday, September 21, 2011 07:03:34 Brian Mendenhall wrote: > > I have recently upgraded torque and maui on my cluster and have > run into a > > weird issue that I am having a hard time understanding why I > can't find > > any documentation on the subject. > > > > Maui: 3.3.1 (upgraded from 3.2.6) > > RM: Torque 2.4.16 (upgraded from 2.1.12) > > AM: Qbank: 2.11 (not upgraded) > > > > Previously, I had torque queues for users that are not charged > time for > > running their jobs, as an example, lets say people that belong to > the unix > > group 'lc_cmb' submit jobs to the 'cmb' queue using 'qsub -q cmb > > script.pbs'. The maui.cfg entry for cmb was: > > > > CLASSCFG[cmb] NOAM=TRUE JOBFLAGS=SHAREDIFSINGLENODE > > MAXJOB[USER]=300,420 > > > > and that basically turned off any charges for the cmb group. > > > > I know that qbank is working fine, and maui <-> interaction is > fine because > > regular users that do get charged time work fine. > > > > Is there a different way of turning off account management > charges for > > queues in maui/torque/qbank? > > > > The error in the maui.log file is pretty simple: > > 09/20 21:49:48 INFO: transaction sent to AM > > 09/20 21:49:48 ALERT: cannot create AM reservation for job '1324' > > (request refused) 09/20 21:49:48 ERROR: cannot start job > '1324' in > > partition DEFAULT > > > > Awaiting 00000237 bytes -- Tue Sep 20 21:28:22 PDT 2011 > > REQUEST=COMMAND=make_reservation AUTH=maui MACHINE=hpc > ACCOUNT=lc_cmb> USER=sungjech WCLIMIT=86400 PROCCOUNT=16 > QOS=DEFAULT CLASS=cmb_exe > > NODETYPE=DEFAULT TYPE=maui JOBID=1324 JOBTYPE=job NODES=1 > REPLY=00000072> STATUSCODE=0 RESULT=sungjech does not have > sufficient funds for job 1324 > > > > I find this weird: > > [root at hpc-pbs maui]$ showconfig | grep '\[cmb\]' > > CLASSCFG[cmb] DEFAULT.FEATURES=[cmb] > > > > [root at hpc-pbs maui]$ diagnose -v -c cmb > > Class/Queue Status > > > > Name Priority Flags QDef QOSList* > > PartitionList Target Limits > > > > cmb 0 [NONE] [NONE] [NONE] > [NONE] > > 0.00 [NONE] DEFAULT.FEATURES=[cmb] MAXJOBPERUSER=420,300 > > > > So, the maxjob configuration is there, but no JOBFLAGS or NOAM > > configurations shown. > > > > ----- > > Brian Mendenhall > > Linux/HPCC Administrator > > University of Southern California > > _______________________________________________ > > mauiusers mailing list > > mauiusers at supercluster.org > > http://www.supercluster.org/mailman/listinfo/mauiusers > _______________________________________________ > mauiusers mailing list > mauiusers at supercluster.org > http://www.supercluster.org/mailman/listinfo/mauiusers > From brianm at usc.edu Wed Sep 21 11:17:40 2011 From: brianm at usc.edu (Brian Mendenhall) Date: Wed, 21 Sep 2011 10:17:40 -0700 Subject: [Mauiusers] Problem with Account Management and Maui 3.3.1 In-Reply-To: <6d60d41c3429.4e79b6e4@usc.edu> References: <6d50a326330.4e790db6@usc.edu> <201109210946.41484.roy.dragseth@cc.uit.no> <6d60d41c3429.4e79b6e4@usc.edu> Message-ID: <6d60bc2e656b.4e79b9c4@usc.edu> I apologize as I don't think I made my goal clear; > So now my question is: is it really just that you either have an > Allocation Manager and everyone gets charged, or is there a > different way to accomplish the same goal? my goal is to allow a class of node or a group (unix account GID) to *not* be charged for running jobs on that queue/node type. ----- Brian Mendenhall Linux/HPCC Administrator University of Southern California ----- Original Message ----- From: Brian Mendenhall Date: Wednesday, September 21, 2011 10:05 am Subject: Re: [Mauiusers] Problem with Account Management and Maui 3.3.1 To: mauiusers at supercluster.org > Thank you for your reply Roy. > > I installed and tested Gold, but did not feel like it brought > anything sufficient enough to justify losing all of our historical > qbank data (the databases are not compatible, no real migration > utilities exist, and I don't have the time to create them), but I > guess I did miss this 'nocharge' issue as I was under the > impression it was part of maui. > > I did, however, find an old patch for maui 3.2.6.p20 that > implemented the 'NOAM' CLASSCFG feature, so this explains how it > used to work. > > So now my question is: is it really just that you either have an > Allocation Manager and everyone gets charged, or is there a > different way to accomplish the same goal? > > I do see that NODECFG has a feature of 'CHARGERATE', but it would > not be feasible for me to have 2700+ entries in my maui.cfg, unless > I can use something like 'NODECFG[cmb] CHARGERATE=0.0' which would > be applied to the nodeset feature 'cmb'. > > Going to try to test it without causing problems ... > > ----- > Brian Mendenhall > Linux/HPCC Administrator > University of Southern California > > ----- Original Message ----- > From: Roy Dragseth > Date: Wednesday, September 21, 2011 12:46 am > Subject: Re: [Mauiusers] Problem with Account Management and Maui > 3.3.1To: mauiusers at supercluster.org > > > We switched from qbank to gold a long time ago so I'm not sure if > > this applies > > to qbank. In gold you can set the chargerate to 0 for a specific > > QoS, maybe > > something similar is possible in qbank. > > > > r. > > > > > > On Wednesday, September 21, 2011 07:03:34 Brian Mendenhall wrote: > > > I have recently upgraded torque and maui on my cluster and have > > run into a > > > weird issue that I am having a hard time understanding why I > > can't find > > > any documentation on the subject. > > > > > > Maui: 3.3.1 (upgraded from 3.2.6) > > > RM: Torque 2.4.16 (upgraded from 2.1.12) > > > AM: Qbank: 2.11 (not upgraded) > > > > > > Previously, I had torque queues for users that are not charged > > time for > > > running their jobs, as an example, lets say people that belong > to > > the unix > > > group 'lc_cmb' submit jobs to the 'cmb' queue using 'qsub -q cmb > > > script.pbs'. The maui.cfg entry for cmb was: > > > > > > CLASSCFG[cmb] NOAM=TRUE JOBFLAGS=SHAREDIFSINGLENODE > > > MAXJOB[USER]=300,420 > > > > > > and that basically turned off any charges for the cmb group. > > > > > > I know that qbank is working fine, and maui <-> interaction is > > fine because > > > regular users that do get charged time work fine. > > > > > > Is there a different way of turning off account management > > charges for > > > queues in maui/torque/qbank? > > > > > > The error in the maui.log file is pretty simple: > > > 09/20 21:49:48 INFO: transaction sent to AM > > > 09/20 21:49:48 ALERT: cannot create AM reservation for job > '1324'> > (request refused) 09/20 21:49:48 ERROR: cannot start > job > > '1324' in > > > partition DEFAULT > > > > > > Awaiting 00000237 bytes -- Tue Sep 20 21:28:22 PDT 2011 > > > REQUEST=COMMAND=make_reservation AUTH=maui MACHINE=hpc > > ACCOUNT=lc_cmb> USER=sungjech WCLIMIT=86400 PROCCOUNT=16 > > QOS=DEFAULT CLASS=cmb_exe > > > NODETYPE=DEFAULT TYPE=maui JOBID=1324 JOBTYPE=job NODES=1 > > REPLY=00000072> STATUSCODE=0 RESULT=sungjech does not have > > sufficient funds for job 1324 > > > > > > I find this weird: > > > [root at hpc-pbs maui]$ showconfig | grep '\[cmb\]' > > > CLASSCFG[cmb] DEFAULT.FEATURES=[cmb] > > > > > > [root at hpc-pbs maui]$ diagnose -v -c cmb > > > Class/Queue Status > > > > > > Name Priority Flags QDef QOSList* > > > PartitionList Target Limits > > > > > > cmb 0 [NONE] [NONE] [NONE] > > [NONE] > > > 0.00 [NONE] DEFAULT.FEATURES=[cmb] > MAXJOBPERUSER=420,300> > > > > So, the maxjob configuration is there, but no JOBFLAGS or NOAM > > > configurations shown. > > > > > > ----- > > > Brian Mendenhall > > > Linux/HPCC Administrator > > > University of Southern California > > > _______________________________________________ > > > mauiusers mailing list > > > mauiusers at supercluster.org > > > http://www.supercluster.org/mailman/listinfo/mauiusers > > _______________________________________________ > > mauiusers mailing list > > mauiusers at supercluster.org > > http://www.supercluster.org/mailman/listinfo/mauiusers > > > _______________________________________________ > mauiusers mailing list > mauiusers at supercluster.org > http://www.supercluster.org/mailman/listinfo/mauiusers > From darby.vicker-1 at nasa.gov Mon Sep 26 07:23:26 2011 From: darby.vicker-1 at nasa.gov (Vicker, Darby (JSC-EG311)) Date: Mon, 26 Sep 2011 08:23:26 -0500 Subject: [Mauiusers] Reservation not behaving as expected Message-ID: Hello, I could use some help figuring out why a reservation was preventing jobs from running in a situation where they should have run. I'm using torque 2.3.6 and maui 3.2.6p21. Our cluster is an SGI ICE system so the node names look like rXiYnZZ where X is the rack number, Y is the IRU number (from 0-3) and ZZ is the node number in the IRU (from 0-15). We have 2 fully populated racks so 8 IRU's/128 nodes total. I wanted to give some dedicated time to a few users for 75% of the machine for a 7 hours, which I did with the following command: service0:~ # setres -n DAC -s 11:00_09/25 -d 0:07:00:00 -u lumpkin:lebeau:kboyles:bstewart 'r1i[0-3]n[0-9]|r2i[0-1]n[0-9]' reservation created reservation 'testing.0' created on 96 nodes (1152 tasks) r1i0n0:1 r1i0n1:1 r1i0n2:1 r1i0n3:1 r1i0n4:1 All seemed well at this point. The DAC reservation should have reserved 96 nodes, leaving 32 left for other users. However, when the reservation took effect yesterday there were several jobs that did not run that should have. There were no other reservations or other policies in effect that should have prevented jobs from running. We aren't using any QOS either - its pretty much a FIFO queue with some soft and hard limits on the number of jobs and number of procs for each user. All of the commands below were taken just after the DAC reservation was in effect. The first job in the queue (83219) should have run - there should have been 32 nodes free. For some reason maui did not run it. But after that all 3 of the 8-node jobs should have run. Looking at the "checkjob -v 83224" output, maui thinks that essentially all the nodes were reserved (except for 8 nodes from 83223). Any idea what might be going on here? Thanks, Darby service0:~ # qstat -a Req'd Elap Job ID Username Queue Jobname NDS Time S Time -------------------- -------- -------- ---------------- ----- ----- - ----- 83219 aschwing huge m0.40a0.00_SAES 32 04:00 Q -- 83223 stuart medium m0.27a30.0b20.0 8 04:00 R 01:55 83224 stuart medium m0.27a0.0b20.0 8 04:00 Q -- 83225 stuart medium m0.27-30.0b20.0 8 04:00 Q -- service0:~ # checkjob -v 83224 checking job 83224 (RM job '83224.service0') State: Idle Creds: user:stuart group:eg3 class:medium qos:DEFAULT WallTime: 00:00:00 of 4:00:00 SubmitTime: Sun Sep 25 10:51:08 (Time Queued Total: 00:17:40 Eligible: 00:17:32) Total Tasks: 96 Req[0] TaskCount: 96 Partition: ALL Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0 Opsys: [NONE] Arch: [NONE] Features: [NONE] Exec: '' ExecSize: 0 ImageSize: 0 Dedicated Resources Per Task: PROCS: 1 NodeAccess: SINGLEUSER TasksPerNode: 12 NodeCount: 8 IWD: [NONE] Executable: [NONE] Bypass: 0 StartCount: 0 PartitionMask: [ALL] SystemQueueTime: Sun Sep 25 10:51:16 PE: 96.00 StartPriority: 17 job cannot run in partition DEFAULT (idle procs do not meet requirements : 0 of 96 procs found) idle procs: 1440 feasible procs: 0 Rejection Reasons: [State : 8][ReserveTime : 120] Detailed Node Availability Information: r1i0n0 rejected : ReserveTime r1i0n1 rejected : ReserveTime r1i0n2 rejected : ReserveTime r1i0n3 rejected : ReserveTime r1i0n4 rejected : ReserveTime r1i0n5 rejected : ReserveTime r1i0n6 rejected : ReserveTime r1i0n7 rejected : ReserveTime r1i0n8 rejected : ReserveTime r1i0n9 rejected : ReserveTime r1i0n10 rejected : ReserveTime r1i0n11 rejected : ReserveTime r1i0n12 rejected : ReserveTime r1i0n13 rejected : ReserveTime r1i0n14 rejected : ReserveTime r1i0n15 rejected : ReserveTime r1i1n0 rejected : ReserveTime r1i1n1 rejected : ReserveTime r1i1n2 rejected : ReserveTime r1i1n3 rejected : ReserveTime r1i1n4 rejected : ReserveTime r1i1n5 rejected : ReserveTime r1i1n6 rejected : ReserveTime r1i1n7 rejected : ReserveTime r1i1n8 rejected : ReserveTime r1i1n9 rejected : ReserveTime r1i1n10 rejected : ReserveTime r1i1n11 rejected : ReserveTime r1i1n12 rejected : ReserveTime r1i1n13 rejected : ReserveTime r1i1n14 rejected : ReserveTime r1i1n15 rejected : ReserveTime r1i2n0 rejected : ReserveTime r1i2n1 rejected : ReserveTime r1i2n2 rejected : ReserveTime r1i2n3 rejected : ReserveTime r1i2n4 rejected : ReserveTime r1i2n5 rejected : ReserveTime r1i2n6 rejected : ReserveTime r1i2n7 rejected : ReserveTime r1i2n8 rejected : ReserveTime r1i2n9 rejected : ReserveTime r1i2n10 rejected : ReserveTime r1i2n11 rejected : ReserveTime r1i2n12 rejected : ReserveTime r1i2n13 rejected : ReserveTime r1i2n14 rejected : ReserveTime r1i2n15 rejected : ReserveTime r1i3n0 rejected : ReserveTime r1i3n1 rejected : ReserveTime r1i3n2 rejected : ReserveTime r1i3n3 rejected : ReserveTime r1i3n4 rejected : ReserveTime r1i3n5 rejected : ReserveTime r1i3n6 rejected : ReserveTime r1i3n7 rejected : ReserveTime r1i3n8 rejected : ReserveTime r1i3n9 rejected : ReserveTime r1i3n10 rejected : ReserveTime r1i3n11 rejected : ReserveTime r1i3n12 rejected : ReserveTime r1i3n13 rejected : ReserveTime r1i3n14 rejected : ReserveTime r1i3n15 rejected : ReserveTime r2i0n0 rejected : ReserveTime r2i0n1 rejected : ReserveTime r2i0n2 rejected : ReserveTime r2i0n3 rejected : ReserveTime r2i0n4 rejected : ReserveTime r2i0n5 rejected : ReserveTime r2i0n6 rejected : ReserveTime r2i0n7 rejected : ReserveTime r2i0n8 rejected : ReserveTime r2i0n9 rejected : ReserveTime r2i0n10 rejected : ReserveTime r2i0n11 rejected : ReserveTime r2i0n12 rejected : ReserveTime r2i0n13 rejected : ReserveTime r2i0n14 rejected : ReserveTime r2i0n15 rejected : ReserveTime r2i1n0 rejected : ReserveTime r2i1n1 rejected : ReserveTime r2i1n2 rejected : ReserveTime r2i1n3 rejected : ReserveTime r2i1n4 rejected : ReserveTime r2i1n5 rejected : ReserveTime r2i1n6 rejected : ReserveTime r2i1n7 rejected : ReserveTime r2i1n8 rejected : ReserveTime r2i1n9 rejected : ReserveTime r2i1n10 rejected : ReserveTime r2i1n11 rejected : ReserveTime r2i1n12 rejected : ReserveTime r2i1n13 rejected : ReserveTime r2i1n14 rejected : ReserveTime r2i1n15 rejected : ReserveTime r2i2n0 rejected : ReserveTime r2i2n1 rejected : ReserveTime r2i2n2 rejected : ReserveTime r2i2n3 rejected : ReserveTime r2i2n4 rejected : ReserveTime r2i2n5 rejected : ReserveTime r2i2n6 rejected : ReserveTime r2i2n7 rejected : ReserveTime r2i2n8 rejected : State r2i2n9 rejected : State r2i2n10 rejected : State r2i2n11 rejected : State r2i2n12 rejected : State r2i2n13 rejected : State r2i2n14 rejected : State r2i2n15 rejected : State r2i3n0 rejected : ReserveTime r2i3n1 rejected : ReserveTime r2i3n2 rejected : ReserveTime r2i3n3 rejected : ReserveTime r2i3n4 rejected : ReserveTime r2i3n5 rejected : ReserveTime r2i3n6 rejected : ReserveTime r2i3n7 rejected : ReserveTime r2i3n8 rejected : ReserveTime r2i3n9 rejected : ReserveTime r2i3n10 rejected : ReserveTime r2i3n11 rejected : ReserveTime r2i3n12 rejected : ReserveTime r2i3n13 rejected : ReserveTime r2i3n14 rejected : ReserveTime r2i3n15 rejected : ReserveTime service0:~ # checknode r2i3n0 checking node r2i3n0 State: Idle (in current state for 00:01:02) Configured Resources: PROCS: 12 MEM: 23G SWAP: 23G DISK: 1M Utilized Resources: [NONE] Dedicated Resources: [NONE] Opsys: linux Arch: [NONE] Speed: 1.00 Load: 0.000 Network: [DEFAULT] Features: [NONE] Attributes: [Batch] Classes: [ginormous 12:12][debug 12:12][large 12:12][huge 12:12][medium 12:12][route 12:12][small 12:12][super 12:12][tiny 12:12] Total Time: INFINITY Up: INFINITY (99.93%) Active: INFINITY (82.06%) Reservations: Job '83219'(x12) 2:03:58 -> 6:03:58 (4:00:00) service0:~ # checknode r1i0n0 checking node r1i0n0 State: Idle (in current state for 00:01:33) Configured Resources: PROCS: 12 MEM: 23G SWAP: 23G DISK: 1M Utilized Resources: [NONE] Dedicated Resources: [NONE] Opsys: linux Arch: [NONE] Speed: 1.00 Load: 0.000 Network: [DEFAULT] Features: [NONE] Attributes: [Batch] Classes: [ginormous 12:12][debug 12:12][large 12:12][huge 12:12][medium 12:12][route 12:12][small 12:12][super 12:12][tiny 12:12] Total Time: INFINITY Up: INFINITY (99.79%) Active: 77:16:44:11 (19.63%) Reservations: User 'DAC.0'(x1) -00:09:50 -> 6:50:10 (7:00:00) Blocked Resources at -00:09:50 Procs: 12/12 (100.00%) service0:~ # diagnose -r Diagnosing Reservations ResID Type Par StartTime EndTime Duration Node Task Proc ----- ---- --- --------- ------- -------- ---- ---- ---- DAC.0 User DEF -00:10:21 6:49:39 7:00:00 96 96 1152 Flags: PREEMPTEE ACL: RES==DAC.0= USER==lumpkin+:==lebeau+:==kboyles+:==bstewart+ CL: RES==DAC.0 Task Resources: PROCS: [ALL] Attributes (HostList='r1i[0-3]n[0-9]|r2i[0-1]n[0-9]') Active PH: 0.00/202.16 (0.00%) 83223 Job DEF -1:57:04 2:02:56 4:00:00 8 96 96 ACL: JOB==83223= CL: JOB==83223 USER==stuart GROUP==eg3 CLASS==medium QOS==DEFAULT DURATION==4:00:00 PROC==96 debug.1.0 User DEF 20:49:39 1:05:49:39 9:00:00 8 8 96 Flags: STANDINGRES SHARED ACL: RES==debug.1= CLASS==debug+ CL: RES==debug.1 Task Resources: PROCS: [ALL] Attributes (HostList='r2i3n8 r2i3n9 r2i3n10 r2i3n11 r2i3n12 r2i3n13 r2i3n14 r2i3n15') SRAttributes (TaskCount: 8 StartTime: 8:00:00 EndTime: 17:00:00 Days: Mon,Tue,Wed,Thu,Fri) 83219 Job DEF 2:02:56 6:02:56 4:00:00 32 384 384 Flags: PREEMPTEE ACL: JOB==83219= CL: JOB==83219 USER==aschwing GROUP==eg3 CLASS==huge QOS==DEFAULT DURATION==4:00:00 PROC==384 Attributes (Priority=56) Active Reserved Processors: 96 service0:~ # From denismpa at gmail.com Tue Sep 27 07:45:46 2011 From: denismpa at gmail.com (Denis) Date: Tue, 27 Sep 2011 10:45:46 -0300 Subject: [Mauiusers] Reservations for GPUs Message-ID: Hello, Burkhard Burkhard Bunk physik.hu-berlin.de> writes: > > Hi Henrik, > > I had a similar problem recently. > Since "consumable resources" didn't work for me, I used standing > reservations to "split the nodes": > Hope that helps. with this approach do you have any mechanism that can allow the user to figure out which gpu is in use and which one is available when he gets allocated into a node? Do you still use this fashion for scheduling gpus? > > Regards, > Burkhard Bunk. Thank you in advance,, Denis. -- Denis Anjos, www.versatushpc.com.br From denismpa at gmail.com Tue Sep 27 08:07:34 2011 From: denismpa at gmail.com (Denis) Date: Tue, 27 Sep 2011 11:07:34 -0300 Subject: [Mauiusers] Reservations for GPUs In-Reply-To: <19A43E2B-C6B2-4E0F-B55B-709B57BD7882@ur.rochester.edu> References: <19A43E2B-C6B2-4E0F-B55B-709B57BD7882@ur.rochester.edu> Message-ID: Hello, Steeve, thank you for your reply! 2011/9/27 Steve Crusan : > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > > On Sep 27, 2011, at 9:45 AM, Denis wrote: > >> Hello, Burkhard >> >> Burkhard Bunk physik.hu-berlin.de> writes: >> >>> >>> Hi Henrik, >>> >>> I had a similar problem recently. >>> Since "consumable resources" didn't work for me, I used standing >>> reservations to "split the nodes": >>> Hope that helps. >> with this approach do you have any mechanism that can allow the user to figure >> out which gpu is in use and which one is available when he gets allocated into a >> node? > > > You can use the PBS_GPUFILE as an indicator to which GPU a user has been assigned. > > http://www.clusterresources.com/torquedocs21/3.7schedulinggpus.shtml I am able to schedule gpus with torque as long as I dont use maui as scheduler, because it will just not start the jobs when gpus are requested... > > In reality, this is merely a placeholder value, because I'm not sure TORQUE physically denies access to the GPU. You can however >use prolog/epilog scripts to set user permissions, but in my experience people haven't abused or mistakenly used the wrong GPUs. > We are using Gromacs with openmm and it takes a paraemter with the index of the gpu on which it should start running. It feels very comfortable to use the PBS_GPUFILE as an indexer. The problem is that we'd like to use fairshare within our scheduler, and torque-sched will not provide it. > If you use Moab as your scheduler, Moab can automatically will track GPUs as a consumable resource. > Do you know if there is any workaround to get maui+torque scheduling gpus? I have googled for a while and so far, every result makes tend believe that it does not work. > > ~Steve > > >> >> Do you still use this fashion for scheduling gpus? >> >>> >>> Regards, >>> Burkhard Bunk. >> >> Thank you in advance,, >> Denis. >> >> >> -- >> Denis Anjos, >> www.versatushpc.com.br >> _______________________________________________ >> mauiusers mailing list >> mauiusers at supercluster.org >> http://www.supercluster.org/mailman/listinfo/mauiusers > > ?---------------------- > ?Steve Crusan > ?System Administrator > ?Center for Research Computing > ?University of Rochester > ?https://www.crc.rochester.edu/ > > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG/MacGPG2 v2.0.17 (Darwin) > Comment: GPGTools - http://gpgtools.org > > iQEcBAEBAgAGBQJOgdX9AAoJENS19LGOpgqK5CYH/jCqef+zdw2DWUe6rHVaAea3 > Zt1d446SeelMDmptdLtOPgGT1LWZc9sdnLZH6Su0nmzi90S+jw5ZFNEATXhtq5oX > RVMFe8WROFBio3oBeDlZFldPgAmuA6FLXyiUY3x9JYtoFOX1cdmdWgcIqRX5rvWH > pW7W0603fxCSZdg0Lxgwg9HfbHEQznuSpPgc8AxPkheIKdCn0mt5fWQJv6qFLLEq > FRTCFbfi+SWpwQy98qJjqpDryQe025ryxH9CxUfaStqqxBVgTIy3DrmfUQkc9stk > nccfPWdRqqWSGfCf/U8DUGfeGRKd+nFdB+IRzrSw7OoeBuSJrOzyFKugg0/Atrw= > =xmjP > -----END PGP SIGNATURE----- > -- Denis Anjos, www.versatushpc.com.br From scrusan at ur.rochester.edu Tue Sep 27 07:56:06 2011 From: scrusan at ur.rochester.edu (Steve Crusan) Date: Tue, 27 Sep 2011 09:56:06 -0400 Subject: [Mauiusers] Reservations for GPUs In-Reply-To: References: Message-ID: <19A43E2B-C6B2-4E0F-B55B-709B57BD7882@ur.rochester.edu> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sep 27, 2011, at 9:45 AM, Denis wrote: > Hello, Burkhard > > Burkhard Bunk physik.hu-berlin.de> writes: > >> >> Hi Henrik, >> >> I had a similar problem recently. >> Since "consumable resources" didn't work for me, I used standing >> reservations to "split the nodes": >> Hope that helps. > with this approach do you have any mechanism that can allow the user to figure > out which gpu is in use and which one is available when he gets allocated into a > node? You can use the PBS_GPUFILE as an indicator to which GPU a user has been assigned. http://www.clusterresources.com/torquedocs21/3.7schedulinggpus.shtml In reality, this is merely a placeholder value, because I'm not sure TORQUE physically denies access to the GPU. You can however use prolog/epilog scripts to set user permissions, but in my experience people haven't abused or mistakenly used the wrong GPUs. If you use Moab as your scheduler, Moab can automatically will track GPUs as a consumable resource. ~Steve > > Do you still use this fashion for scheduling gpus? > >> >> Regards, >> Burkhard Bunk. > > Thank you in advance,, > Denis. > > > -- > Denis Anjos, > www.versatushpc.com.br > _______________________________________________ > mauiusers mailing list > mauiusers at supercluster.org > http://www.supercluster.org/mailman/listinfo/mauiusers ---------------------- Steve Crusan System Administrator Center for Research Computing University of Rochester https://www.crc.rochester.edu/ -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.17 (Darwin) Comment: GPGTools - http://gpgtools.org iQEcBAEBAgAGBQJOgdX9AAoJENS19LGOpgqK5CYH/jCqef+zdw2DWUe6rHVaAea3 Zt1d446SeelMDmptdLtOPgGT1LWZc9sdnLZH6Su0nmzi90S+jw5ZFNEATXhtq5oX RVMFe8WROFBio3oBeDlZFldPgAmuA6FLXyiUY3x9JYtoFOX1cdmdWgcIqRX5rvWH pW7W0603fxCSZdg0Lxgwg9HfbHEQznuSpPgc8AxPkheIKdCn0mt5fWQJv6qFLLEq FRTCFbfi+SWpwQy98qJjqpDryQe025ryxH9CxUfaStqqxBVgTIy3DrmfUQkc9stk nccfPWdRqqWSGfCf/U8DUGfeGRKd+nFdB+IRzrSw7OoeBuSJrOzyFKugg0/Atrw= =xmjP -----END PGP SIGNATURE----- From jasonw at Jhu.edu Tue Sep 27 09:56:40 2011 From: jasonw at Jhu.edu (Jason Williams) Date: Tue, 27 Sep 2011 11:56:40 -0400 Subject: [Mauiusers] Maui, FairShare, and scheduling GPUs Message-ID: <4E81F238.8080403@Jhu.edu> I'm curious if anyone has taken a look at getting Torque 2.5.x and Maui working together to schedule GPUS and track the usage via FairShare. I am pondering what would be needed to actually make that happen within the Maui source, but if someone else has already started working on this, it would be interesting to get their take on the situation. I've noticed, via some googling and reading on the list here, that it seems difficult to do without some mods to the source. If you've thought about it or have started on it, please email me back. -- Jason Williams Sr. Systems Administrator Homewood HPC Cluster Johns Hopkins University From denismpa at gmail.com Tue Sep 27 10:10:20 2011 From: denismpa at gmail.com (Denis) Date: Tue, 27 Sep 2011 13:10:20 -0300 Subject: [Mauiusers] Maui, FairShare, and scheduling GPUs In-Reply-To: <4E81F238.8080403@Jhu.edu> References: <4E81F238.8080403@Jhu.edu> Message-ID: 2011/9/27 Jason Williams : > I'm curious if anyone has taken a look at getting Torque 2.5.x and Maui > working together to schedule GPUS and track the usage via FairShare. ?I > am pondering what would be needed to actually make that happen within > the Maui source, but if someone else has already started working on > this, it would be interesting to get their take on the situation. ?I've > noticed, via some googling and reading on the list here, that it seems > difficult to do without some mods to the source. ?If you've thought > about it or have started on it, please email me back. > > -- > Jason Williams > Sr. Systems Administrator > Homewood HPC Cluster > Johns Hopkins University > > _______________________________________________ > mauiusers mailing list > mauiusers at supercluster.org > http://www.supercluster.org/mailman/listinfo/mauiusers > A am wondering the same. Bouncing here to subscribe. -- Denis Anjos, www.versatushpc.com.br From fcaba at uns.edu.ar Tue Sep 27 16:23:46 2011 From: fcaba at uns.edu.ar (Fernando Caba) Date: Tue, 27 Sep 2011 19:23:46 -0300 Subject: [Mauiusers] =?iso-8859-1?q?Can=B4t_get_busy_nodes?= Message-ID: <4E824CF2.2080204@uns.edu.ar> Hi every body, i am using torque 3.0.1 and maui 3.3.1 in a configuration composed by a front end and 4 nodes (2 processors, 6 cores each) totalizing 48 cores. I need to configure that in each node don?t run no more than 12 process (particular we are using vasp), so we wan?t no more than 12 vasp process by node. How can i configure this? I?m so confusing reading a lot of information from torque and maui configuration. Thank?s in advance. -- ---------------------------------------------------- Ing. Fernando Caba Director General de Telecomunicaciones Universidad Nacional del Sur http://www.dgt.uns.edu.ar Tel/Fax: (54)-291-4595166 Tel: (54)-291-4595101 int. 2050 Avda. Alem 1253, (B8000CPB) Bah?a Blanca - Argentina ---------------------------------------------------- From gus at ldeo.columbia.edu Tue Sep 27 17:07:34 2011 From: gus at ldeo.columbia.edu (Gus Correa) Date: Tue, 27 Sep 2011 19:07:34 -0400 Subject: [Mauiusers] =?iso-8859-1?q?Can=B4t_get_busy_nodes?= In-Reply-To: <4E824CF2.2080204@uns.edu.ar> References: <4E824CF2.2080204@uns.edu.ar> Message-ID: <4E825736.8080103@ldeo.columbia.edu> Hi Fernando Did you try something like this in your ${TORQUE}/server_priv/nodes file? frontend np=12 [skip this line if the frontend is not to do job work] node1 np=12 node2 np=12 node3 np=12 node4 np=12 This is probably the first thing to do. It is not Maui, just plain Torque [actually pbs_server configuration]. The lines above assume your nodes are called node1, ... and the head node is called frontend, in some name-resolvable manner [most likely in your /etc/hosts file, most likely pointing to the nodes' IP addresses in your cluster's private subnet, 192.168.X.X, 10.X.X.X or equivalent]. The 'np=12' clause will allow at most 12 *processes* per node. [However, if VASP is *threaded*, say via OpenMP, then it won't prevent that several threads are launched from each process. To handle threaded you can use some tricks, such as requesting more cores than processes. Sorry, I am not familiar to VASP to be able to say more than this.] I would suggest that you take a look at the Torque Admin Manual for more details: http://www.adaptivecomputing.com/resources/docs/torque/ There are further controls in Maui, such as 'JOBNODEMATCHPOLICY EXACTNODE' in maui.cfg, for instance, if you want full nodes allocated to each job, as opposed to jobs sharing cores in a single node. However, these choices may come later. [You can change maui.cfg and restart the maui scheduler to test various changes.] For Maui details see the Maui Admin Guide: http://www.adaptivecomputing.com/resources/docs/maui/index.php I hope this helps, Gus Correa Fernando Caba wrote: > Hi every body, i am using torque 3.0.1 and maui 3.3.1 in a configuration > composed by a front end and 4 nodes (2 processors, 6 cores each) > totalizing 48 cores. > I need to configure that in each node don?t run no more than 12 process > (particular we are using vasp), so we wan?t no more than 12 vasp process > by node. > How can i configure this? I?m so confusing reading a lot of information > from torque and maui configuration. > > Thank?s in advance. > From gus at ldeo.columbia.edu Tue Sep 27 18:16:06 2011 From: gus at ldeo.columbia.edu (Gus Correa) Date: Tue, 27 Sep 2011 20:16:06 -0400 Subject: [Mauiusers] =?iso-8859-1?q?Can=B4t_get_busy_nodes?= In-Reply-To: <4E825736.8080103@ldeo.columbia.edu> References: <4E824CF2.2080204@uns.edu.ar> <4E825736.8080103@ldeo.columbia.edu> Message-ID: <4E826746.4070900@ldeo.columbia.edu> PS - After you edit ${TORQUE}/server_priv/nodes, restart the pbs_server for the changes to take effect. Also, where I said "as opposed to jobs sharing cores in a single node", it should have been "as opposed to jobs sharing a node" Gus Correa Gus Correa wrote: > Hi Fernando > > Did you try something like this in your > ${TORQUE}/server_priv/nodes file? > > frontend np=12 [skip this line if the frontend is not to do job work] > node1 np=12 > node2 np=12 > node3 np=12 > node4 np=12 > > This is probably the first thing to do. > It is not Maui, just plain Torque [actually pbs_server configuration]. > > The lines above assume your nodes are called node1, ... > and the head node is called frontend, > in some name-resolvable manner [most likely > in your /etc/hosts file, most likely pointing to the nodes' > IP addresses in your cluster's private subnet, 192.168.X.X, > 10.X.X.X or equivalent]. > > The 'np=12' clause will allow at most 12 *processes* per node. > > > [However, if VASP is *threaded*, say via OpenMP, then it won't > prevent that several threads are launched from each process. > To handle threaded you can use some tricks, such as requesting > more cores than processes. > Sorry, I am not familiar to VASP to be able to say more than this.] > > I would suggest that you take a look at the Torque Admin Manual > for more details: > http://www.adaptivecomputing.com/resources/docs/torque/ > > There are further controls in Maui, such as > 'JOBNODEMATCHPOLICY EXACTNODE' in maui.cfg, > for instance, if you want full nodes allocated to each job, > as opposed to jobs sharing cores in a single node. > However, these choices may come later. > [You can change maui.cfg and restart the maui scheduler to > test various changes.] > > For Maui details see the Maui Admin Guide: > http://www.adaptivecomputing.com/resources/docs/maui/index.php > > I hope this helps, > Gus Correa > > Fernando Caba wrote: >> Hi every body, i am using torque 3.0.1 and maui 3.3.1 in a configuration >> composed by a front end and 4 nodes (2 processors, 6 cores each) >> totalizing 48 cores. >> I need to configure that in each node don?t run no more than 12 process >> (particular we are using vasp), so we wan?t no more than 12 vasp process >> by node. >> How can i configure this? I?m so confusing reading a lot of information >> from torque and maui configuration. >> >> Thank?s in advance. >> > > _______________________________________________ > mauiusers mailing list > mauiusers at supercluster.org > http://www.supercluster.org/mailman/listinfo/mauiusers From fcaba at uns.edu.ar Tue Sep 27 20:46:12 2011 From: fcaba at uns.edu.ar (Fernando Caba) Date: Tue, 27 Sep 2011 23:46:12 -0300 Subject: [Mauiusers] =?utf-8?q?Can=C2=B4t_get_busy_nodes?= In-Reply-To: <31ed9baa187f0719293938ddb63876f6@webmail.clustering.com.ar> References: <31ed9baa187f0719293938ddb63876f6@webmail.clustering.com.ar> Message-ID: <4E828A74.9010002@uns.edu.ar> Diego, gracias por la pronta respuesta. Los procesos se distribuyen en un nodo, siempre van al mismo, no hay procesos en el front end. Necesito configurar que solo se ejecuten 12 procesos por nodo y no m?s. La ejecuci?n dentro de un script funciona bien. Este es el script: #!/bin/bash cd $PBS_O_WORKDIR mpirun -np 8 /usr/local/vasp/vasp Se ejecutan en un nodo 8 procesos de vasp. El problema es que si ejecuto otro job con 8 cores (-np 8), este c?lculo se ejecuta en el mismo nodo, superando los 12 nucleos f?sicos. Lo que no he podido confgurar es que no se pase de 12. Saludos ---------------------------------------------------- Ing. Fernando Caba Director General de Telecomunicaciones Universidad Nacional del Sur http://www.dgt.uns.edu.ar Tel/Fax: (54)-291-4595166 Tel: (54)-291-4595101 int. 2050 Avda. Alem 1253, (B8000CPB) Bah?a Blanca - Argentina ---------------------------------------------------- El 27/09/2011 07:49 PM, Diego M. Vadell escribi?: > Hola Fernando, > > Me parece que el que decide eso es el Torque por un lado, y el mpirun por > el otro. ?No ser? que ten?s mal la linea de ejecuci?n dentro del script que > le das al qsub y manda todos los procesos al master de la ejecuci?n? > > Saludos > -- Diego > > ----------------original message----------------- > From: "Fernando Caba" fcaba at uns.edu.ar > To: mauiusers at supercluster.org > Date: Tue, 27 Sep 2011 19:23:46 -0300 > ------------------------------------------------- > > >> Hi every body, i am using torque 3.0.1 and maui 3.3.1 in a configuration >> composed by a front end and 4 nodes (2 processors, 6 cores each) >> totalizing 48 cores. >> I need to configure that in each node don?t run no more than 12 process >> (particular we are using vasp), so we wan?t no more than 12 vasp process >> by node. >> How can i configure this? I?m so confusing reading a lot of information >> from torque and maui configuration. >> >> Thank?s in advance. >> >> -- >> ---------------------------------------------------- >> Ing. Fernando Caba >> Director General de Telecomunicaciones >> Universidad Nacional del Sur >> http://www.dgt.uns.edu.ar >> Tel/Fax: (54)-291-4595166 >> Tel: (54)-291-4595101 int. 2050 >> Avda. Alem 1253, (B8000CPB) Bah?a Blanca - Argentina >> ---------------------------------------------------- >> >> _______________________________________________ >> mauiusers mailing list >> mauiusers at supercluster.org >> http://www.supercluster.org/mailman/listinfo/mauiusers >> >> > > From fcaba at uns.edu.ar Tue Sep 27 21:15:21 2011 From: fcaba at uns.edu.ar (Fernando Caba) Date: Wed, 28 Sep 2011 00:15:21 -0300 Subject: [Mauiusers] =?iso-8859-1?q?Can=B4t_get_busy_nodes?= In-Reply-To: <4E825736.8080103@ldeo.columbia.edu> References: <4E824CF2.2080204@uns.edu.ar> <4E825736.8080103@ldeo.columbia.edu> Message-ID: <4E829149.7060309@uns.edu.ar> Hi Gus, my node file /var/spool/torque /server_priv/nodes looks like: [root at fe server_priv]# more nodes n10 np=12 n11 np=12 n12 np=12 n13 np=12 [root at fe server_priv]# it is exact as your comment. My script: #!/bin/bash cd $PBS_O_WORKDIR mpirun -np 8 /usr/local/vasp/vasp launch 8 vasp in one node. If i start one job more (with -np 8), the job will run in the same node (n13). So if i start another job with -np 8 (or -np 4), it will run in the same node n13. I configured JOBNODEMATCHPOLICY EXACTNODE in maui.cfg, but unfortunately the ran in node n13. This is an example of the output of top top - 00:05:53 up 14 days, 6:47, 1 user, load average: 4.18, 4.06, 4.09 Mem: 15955108k total, 13287888k used, 2667220k free, 142168k buffers Swap: 67111528k total, 16672k used, 67094856k free, 11360332k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 21796 patricia 25 0 463m 291m 12m R 100.5 1.9 517:29.59 vasp 21797 patricia 25 0 448m 276m 11m R 100.2 1.8 518:51.49 vasp 21798 patricia 25 0 458m 287m 11m R 100.2 1.8 522:01.79 vasp 21799 patricia 25 0 448m 276m 11m R 99.9 1.8 519:04.25 vasp 1 root 15 0 10348 672 568 S 0.0 0.0 0:00.53 init 2 root RT -5 0 0 0 S 0.0 0.0 0:00.06 migration/0 3 root 34 19 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/0 4 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/0 5 root RT -5 0 0 0 S 0.0 0.0 0:00.04 migration/1 The job that generate those 4 vasp process is: #!/bin/bash cd $PBS_O_WORKDIR mpirun -np 4 /usr/local/vasp/vasp Thanks ---------------------------------------------------- Ing. Fernando Caba Director General de Telecomunicaciones Universidad Nacional del Sur http://www.dgt.uns.edu.ar Tel/Fax: (54)-291-4595166 Tel: (54)-291-4595101 int. 2050 Avda. Alem 1253, (B8000CPB) Bah?a Blanca - Argentina ---------------------------------------------------- El 27/09/2011 08:07 PM, Gus Correa escribi?: > Hi Fernando > > Did you try something like this in your > ${TORQUE}/server_priv/nodes file? > > frontend np=12 [skip this line if the frontend is not to do job work] > node1 np=12 > node2 np=12 > node3 np=12 > node4 np=12 > > This is probably the first thing to do. > It is not Maui, just plain Torque [actually pbs_server configuration]. > > The lines above assume your nodes are called node1, ... > and the head node is called frontend, > in some name-resolvable manner [most likely > in your /etc/hosts file, most likely pointing to the nodes' > IP addresses in your cluster's private subnet, 192.168.X.X, > 10.X.X.X or equivalent]. > > The 'np=12' clause will allow at most 12 *processes* per node. > > > [However, if VASP is *threaded*, say via OpenMP, then it won't > prevent that several threads are launched from each process. > To handle threaded you can use some tricks, such as requesting > more cores than processes. > Sorry, I am not familiar to VASP to be able to say more than this.] > > I would suggest that you take a look at the Torque Admin Manual > for more details: > http://www.adaptivecomputing.com/resources/docs/torque/ > > There are further controls in Maui, such as > 'JOBNODEMATCHPOLICY EXACTNODE' in maui.cfg, > for instance, if you want full nodes allocated to each job, > as opposed to jobs sharing cores in a single node. > However, these choices may come later. > [You can change maui.cfg and restart the maui scheduler to > test various changes.] > > For Maui details see the Maui Admin Guide: > http://www.adaptivecomputing.com/resources/docs/maui/index.php > > I hope this helps, > Gus Correa > > Fernando Caba wrote: >> Hi every body, i am using torque 3.0.1 and maui 3.3.1 in a configuration >> composed by a front end and 4 nodes (2 processors, 6 cores each) >> totalizing 48 cores. >> I need to configure that in each node don?t run no more than 12 process >> (particular we are using vasp), so we wan?t no more than 12 vasp process >> by node. >> How can i configure this? I?m so confusing reading a lot of information >> from torque and maui configuration. >> >> Thank?s in advance. >> > _______________________________________________ > mauiusers mailing list > mauiusers at supercluster.org > http://www.supercluster.org/mailman/listinfo/mauiusers > From denismpa at gmail.com Tue Sep 27 21:26:58 2011 From: denismpa at gmail.com (Denis) Date: Wed, 28 Sep 2011 00:26:58 -0300 Subject: [Mauiusers] =?iso-8859-1?q?Can=B4t_get_busy_nodes?= In-Reply-To: <4E829149.7060309@uns.edu.ar> References: <4E824CF2.2080204@uns.edu.ar> <4E825736.8080103@ldeo.columbia.edu> <4E829149.7060309@uns.edu.ar> Message-ID: Hello, Fernando! *it goes in English because my Portuguese would probably not help. ;) 2011/9/28 Fernando Caba : > Hi Gus, my node file /var/spool/torque /server_priv/nodes looks like: > > [root at fe server_priv]# more nodes > n10 np=12 > n11 np=12 > n12 np=12 > n13 np=12 > [root at fe server_priv]# > > it is exact as your comment. > > My script: > > #!/bin/bash > > cd $PBS_O_WORKDIR > > mpirun -np 8 /usr/local/vasp/vasp > You are missing to inform pbs that you are using 8 cores. you have to add before anything runs in your script a line: #PBS -lnodes=8 Torque cannot trace the number of mpi processes. A user could request 4 cpus and start n mpi processes for example. So, requesting it in your script as #PBS -lnodes=8 then running mpirun -np 8 will do the job. Regards, -- Denis Anjos, www.versatushpc.com.br From amessina at ictp.it Wed Sep 28 01:05:05 2011 From: amessina at ictp.it (Antonio Messina) Date: Wed, 28 Sep 2011 09:05:05 +0200 Subject: [Mauiusers] =?iso-8859-1?q?Can=B4t_get_busy_nodes?= In-Reply-To: References: <4E824CF2.2080204@uns.edu.ar> <4E825736.8080103@ldeo.columbia.edu> <4E829149.7060309@uns.edu.ar> Message-ID: <4E82C721.2050907@ictp.it> On 28/09/11 05.26, Denis wrote: > You are missing to inform pbs that you are using 8 cores. > you have to add before anything runs in your script a line: > #PBS -lnodes=8 > > Torque cannot trace the number of mpi processes. A user could request > 4 cpus and start n mpi processes for example. > > So, requesting it in your script as > #PBS -lnodes=8 > then running mpirun -np 8 will do the job. As a side note, in order to prevent jobs submitted with wrong/missing arguments to interfere with other jobs, we use a feature of the kernel called *cpuset*, which seems to work very well. Torque: 2.5.5 Maui: 3.3.1 doc: http://www.clusterresources.com/torquedocs21/3.5linuxcpusets.shtml .a. -- Antonio Messina I.T. Specialist email: amessina at ictp.it | The Abdus Salam ICTP phone: +39 040-2240-691 | Strada Costiera, 11 fax: +39 040-2240-7691 | 34151 Trieste, Italy -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2949 bytes Desc: S/MIME Cryptographic Signature Url : http://www.supercluster.org/pipermail/mauiusers/attachments/20110928/24201b91/attachment.bin From arnaubria at pic.es Wed Sep 28 08:40:52 2011 From: arnaubria at pic.es (Arnau Bria) Date: Wed, 28 Sep 2011 16:40:52 +0200 Subject: [Mauiusers] maui limits? looking for experience Message-ID: <20110928164052.5a05183a@amarrosa.pic.es> Hi all, we've been using torque/maui for a long time. Our initial cluster was about 50 nodes and now ~350 with 3k processors. It has been working fine since last cluster upgrade, when we added last 500 processors. Since then, maui client commands hang and we had to increase poll interval cause scheduling cycle took too much... Now, with a system with 3k running jobs and 3k in queue, we're facing more maui issues... So, we were wondering which are maui limits, if we have reached any of them and if anyone who already reached our limits could share his experience, on solving them, with us. we're running maui-3.3-1.x86_64. Many thanks in advance, Cheers, Arnau From michel.beland at rqchp.qc.ca Wed Sep 28 09:15:50 2011 From: michel.beland at rqchp.qc.ca (=?UTF-8?B?TWljaGVsIELDqWxhbmQ=?=) Date: Wed, 28 Sep 2011 11:15:50 -0400 Subject: [Mauiusers] maui limits? looking for experience In-Reply-To: <20110928164052.5a05183a@amarrosa.pic.es> References: <20110928164052.5a05183a@amarrosa.pic.es> Message-ID: <4E833A26.8070900@rqchp.qc.ca> Hi, > we've been using torque/maui for a long time. Our initial cluster was > about 50 nodes and now ~350 with 3k processors. > > It has been working fine since last cluster upgrade, when we added > last 500 processors. Since then, maui client commands hang and we had > to increase poll interval cause scheduling cycle took too much... Now, > with a system with 3k running jobs and 3k in queue, we're facing more > maui issues... > > So, we were wondering which are maui limits, if we have reached any of > them and if anyone who already reached our limits could share his > experience, on solving them, with us. > > we're running maui-3.3-1.x86_64. I would advise defining a limit on idle jobs per user. For example: USERCFG[DEFAULT] MAXIJOB=200 or any suitable number for you site. Alternatively, Torque has a per-queue max_user_queuable setting, but it counts both running and queued jobs. If you use a route queue to route your job to an execution queue, you can define this for the execution queue and jobs will be moved to the execution queue only when the limit is respected. Both solutions should decrease the load on Maui as it does not need to schedule as many jobs at a time. -- Michel B?land, analyste en calcul scientifique michel.beland at calculquebec.ca bureau S-250, pavillon Roger-Gaudry (principal), Universit? de Montr?al t?l?phone : 514 343-6111 poste 3892 t?l?copieur : 514 343-2155 Calcul Qu?bec (www.calculquebec.ca) Calcul Canada (calculcanada.org) From gus at ldeo.columbia.edu Wed Sep 28 09:33:34 2011 From: gus at ldeo.columbia.edu (Gus Correa) Date: Wed, 28 Sep 2011 11:33:34 -0400 Subject: [Mauiusers] =?iso-8859-1?q?Can=B4t_get_busy_nodes?= In-Reply-To: <4E829149.7060309@uns.edu.ar> References: <4E824CF2.2080204@uns.edu.ar> <4E825736.8080103@ldeo.columbia.edu> <4E829149.7060309@uns.edu.ar> Message-ID: <4E833E4E.50702@ldeo.columbia.edu> Hi Fernando Dennis already pointed out the first/main problem. Your Torque/PBS script is not requesting a specific number of nodes and cores/processors. You can ask for 12 processors, even if your MPI command doesn't use all of them: #PBS -l nodes=1:ppn=12 [You can still do mpirun -np 8 if you want.] This will prevent two jobs to run in the same node [which seems to be your goal, if I understood it right]. I like to add also the queue name [even if it is the default] and the job name [for documentation and stdout/stderr naming consistency] #PBS -q myqueue [whatever you called your queue] #PBS -N myjob [15 characters at most, the rest gets truncated] The #PBS clauses must be together and right after the #! /bin/sh line. Ask your users to always add these lines to their jobs. There is a feature of torque that allows you to write a wrapper that will whatever you want to the job script, but if your pool of users is small you can just ask them to cooperate. Of course there is much more that you can add. 'man qsub' and 'man pbs_resources' are good sources of information, highly recommended reading. Then there is what Antonio Messina mentioned, the cpuset feature of Torque. I don't know if you installed Torque with this feature enabled. However, if you did, it will allow the specific cores to be assigned to each process, which could allow node-sharing without jobs stepping on each other toes. However: A) this requires a bit more of setup [not a lot, check the list archives and the Torque Admin Guide] B) if your users are cooperative and request 12 processors for each job, and you're using the Maui 'JOBNODEMATCHPOLICY EXACTNODE' each job will get to a single node anyway. BTW, did you restart Maui after you added 'JOBNODEMATCHPOLICY EXACTNODE' to the maui.cfg file? I hope this helps, Gus Correa Fernando Caba wrote: > Hi Gus, my node file /var/spool/torque /server_priv/nodes looks like: > > [root at fe server_priv]# more nodes > n10 np=12 > n11 np=12 > n12 np=12 > n13 np=12 > [root at fe server_priv]# > > it is exact as your comment. > > My script: > > #!/bin/bash > > cd $PBS_O_WORKDIR > > mpirun -np 8 /usr/local/vasp/vasp > > launch 8 vasp in one node. If i start one job more (with -np 8), > the job will run in the same node (n13). > So if i start another job with -np 8 > (or -np 4), it will run in the same node n13. > > I configured JOBNODEMATCHPOLICY EXACTNODE in maui.cfg, > but unfortunately the ran in node n13. > This is an example of the output of top > > top - 00:05:53 up 14 days, 6:47, 1 user, load average: 4.18, 4.06, 4.09 > Mem: 15955108k total, 13287888k used, 2667220k free, 142168k buffers > Swap: 67111528k total, 16672k used, 67094856k free, 11360332k cached > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 21796 patricia 25 0 463m 291m 12m R 100.5 1.9 517:29.59 vasp > 21797 patricia 25 0 448m 276m 11m R 100.2 1.8 518:51.49 vasp > 21798 patricia 25 0 458m 287m 11m R 100.2 1.8 522:01.79 vasp > 21799 patricia 25 0 448m 276m 11m R 99.9 1.8 519:04.25 vasp > 1 root 15 0 10348 672 568 S 0.0 0.0 0:00.53 init > 2 root RT -5 0 0 0 S 0.0 0.0 0:00.06 migration/0 > 3 root 34 19 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/0 > 4 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/0 > 5 root RT -5 0 0 0 S 0.0 0.0 0:00.04 migration/1 > > The job that generate those 4 vasp process is: > > #!/bin/bash > > cd $PBS_O_WORKDIR > > mpirun -np 4 /usr/local/vasp/vasp > > Thanks > > ---------------------------------------------------- > Ing. Fernando Caba > Director General de Telecomunicaciones > Universidad Nacional del Sur > http://www.dgt.uns.edu.ar > Tel/Fax: (54)-291-4595166 > Tel: (54)-291-4595101 int. 2050 > Avda. Alem 1253, (B8000CPB) Bah?a Blanca - Argentina > ---------------------------------------------------- > > > El 27/09/2011 08:07 PM, Gus Correa escribi?: >> Hi Fernando >> >> Did you try something like this in your >> ${TORQUE}/server_priv/nodes file? >> >> frontend np=12 [skip this line if the frontend is not to do job work] >> node1 np=12 >> node2 np=12 >> node3 np=12 >> node4 np=12 >> >> This is probably the first thing to do. >> It is not Maui, just plain Torque [actually pbs_server configuration]. >> >> The lines above assume your nodes are called node1, ... >> and the head node is called frontend, >> in some name-resolvable manner [most likely >> in your /etc/hosts file, most likely pointing to the nodes' >> IP addresses in your cluster's private subnet, 192.168.X.X, >> 10.X.X.X or equivalent]. >> >> The 'np=12' clause will allow at most 12 *processes* per node. >> >> >> [However, if VASP is *threaded*, say via OpenMP, then it won't >> prevent that several threads are launched from each process. >> To handle threaded you can use some tricks, such as requesting >> more cores than processes. >> Sorry, I am not familiar to VASP to be able to say more than this.] >> >> I would suggest that you take a look at the Torque Admin Manual >> for more details: >> http://www.adaptivecomputing.com/resources/docs/torque/ >> >> There are further controls in Maui, such as >> 'JOBNODEMATCHPOLICY EXACTNODE' in maui.cfg, >> for instance, if you want full nodes allocated to each job, >> as opposed to jobs sharing cores in a single node. >> However, these choices may come later. >> [You can change maui.cfg and restart the maui scheduler to >> test various changes.] >> >> For Maui details see the Maui Admin Guide: >> http://www.adaptivecomputing.com/resources/docs/maui/index.php >> >> I hope this helps, >> Gus Correa >> >> Fernando Caba wrote: >>> Hi every body, i am using torque 3.0.1 and maui 3.3.1 in a configuration >>> composed by a front end and 4 nodes (2 processors, 6 cores each) >>> totalizing 48 cores. >>> I need to configure that in each node don?t run no more than 12 process >>> (particular we are using vasp), so we wan?t no more than 12 vasp process >>> by node. >>> How can i configure this? I?m so confusing reading a lot of information >>> from torque and maui configuration. >>> >>> Thank?s in advance. >>> >> _______________________________________________ >> mauiusers mailing list >> mauiusers at supercluster.org >> http://www.supercluster.org/mailman/listinfo/mauiusers >> > _______________________________________________ > mauiusers mailing list > mauiusers at supercluster.org > http://www.supercluster.org/mailman/listinfo/mauiusers From fcaba at uns.edu.ar Wed Sep 28 12:38:34 2011 From: fcaba at uns.edu.ar (Fernando Caba) Date: Wed, 28 Sep 2011 15:38:34 -0300 Subject: [Mauiusers] =?iso-8859-1?q?Can=B4t_get_busy_nodes?= In-Reply-To: <4E833E4E.50702@ldeo.columbia.edu> References: <4E824CF2.2080204@uns.edu.ar> <4E825736.8080103@ldeo.columbia.edu> <4E829149.7060309@uns.edu.ar> <4E833E4E.50702@ldeo.columbia.edu> Message-ID: <4E8369AA.3030201@uns.edu.ar> Hi everybody, thanks for all answers. I try all that you point out: including #PBS -l nodes=1:ppn=12 adding JOBNODEMATCHPOLICY EXACTNODE to maui.cfg but nothing of this work. I?m thinking that the problem is in another config parameter (maui or torque). I will reading more about all. Thanks!! ---------------------------------------------------- Ing. Fernando Caba Director General de Telecomunicaciones Universidad Nacional del Sur http://www.dgt.uns.edu.ar Tel/Fax: (54)-291-4595166 Tel: (54)-291-4595101 int. 2050 Avda. Alem 1253, (B8000CPB) Bah?a Blanca - Argentina ---------------------------------------------------- El 28/09/2011 12:33 PM, Gus Correa escribi?: > Hi Fernando > > Dennis already pointed out the first/main problem. > Your Torque/PBS script is not requesting a specific number of nodes > and cores/processors. > You can ask for 12 processors, even if your MPI command doesn't > use all of them: > > #PBS -l nodes=1:ppn=12 > > [You can still do mpirun -np 8 if you want.] > > This will prevent two jobs to run in the same node [which seems > to be your goal, if I understood it right]. > > I like to add also the queue name [even if it is the default] > and the job name [for documentation and stdout/stderr > naming consistency] > > #PBS -q myqueue [whatever you called your queue] > #PBS -N myjob [15 characters at most, the rest gets truncated] > > The #PBS clauses must be together and right after the #! /bin/sh line. > > Ask your users to always add these lines to their jobs. > There is a feature of torque that allows you to write a wrapper > that will whatever you want to the job script, > but if your pool of users is small > you can just ask them to cooperate. > > Of course there is much more that you can add. > 'man qsub' and 'man pbs_resources' are good sources of information, > highly recommended reading. > > > Then there is what Antonio Messina mentioned, the cpuset feature > of Torque. > I don't know if you installed Torque with this feature enabled. > However, if you did, it will allow the specific cores to be > assigned to each process, which could allow node-sharing without > jobs stepping on each other toes. > However: > A) this requires a bit more of setup [not a lot, check the > list archives and the Torque Admin Guide] > B) if your users are cooperative and request 12 processors for each job, > and you're using the Maui 'JOBNODEMATCHPOLICY EXACTNODE' each job will > get to a single node anyway. > > BTW, did you restart Maui after you added 'JOBNODEMATCHPOLICY EXACTNODE' > to the maui.cfg file? > > I hope this helps, > Gus Correa > > > Fernando Caba wrote: >> Hi Gus, my node file /var/spool/torque /server_priv/nodes looks like: >> >> [root at fe server_priv]# more nodes >> n10 np=12 >> n11 np=12 >> n12 np=12 >> n13 np=12 >> [root at fe server_priv]# >> >> it is exact as your comment. >> >> My script: >> >> #!/bin/bash >> >> cd $PBS_O_WORKDIR >> >> mpirun -np 8 /usr/local/vasp/vasp >> >> launch 8 vasp in one node. If i start one job more (with -np 8), >> the job will run in the same node (n13). >> So if i start another job with -np 8 >> (or -np 4), it will run in the same node n13. >> >> I configured JOBNODEMATCHPOLICY EXACTNODE in maui.cfg, >> but unfortunately the ran in node n13. >> This is an example of the output of top >> >> top - 00:05:53 up 14 days, 6:47, 1 user, load average: 4.18, 4.06, 4.09 >> Mem: 15955108k total, 13287888k used, 2667220k free, 142168k buffers >> Swap: 67111528k total, 16672k used, 67094856k free, 11360332k cached >> >> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND >> 21796 patricia 25 0 463m 291m 12m R 100.5 1.9 517:29.59 vasp >> 21797 patricia 25 0 448m 276m 11m R 100.2 1.8 518:51.49 vasp >> 21798 patricia 25 0 458m 287m 11m R 100.2 1.8 522:01.79 vasp >> 21799 patricia 25 0 448m 276m 11m R 99.9 1.8 519:04.25 vasp >> 1 root 15 0 10348 672 568 S 0.0 0.0 0:00.53 init >> 2 root RT -5 0 0 0 S 0.0 0.0 0:00.06 migration/0 >> 3 root 34 19 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/0 >> 4 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/0 >> 5 root RT -5 0 0 0 S 0.0 0.0 0:00.04 migration/1 >> >> The job that generate those 4 vasp process is: >> >> #!/bin/bash >> >> cd $PBS_O_WORKDIR >> >> mpirun -np 4 /usr/local/vasp/vasp >> >> Thanks >> >> ---------------------------------------------------- >> Ing. Fernando Caba >> Director General de Telecomunicaciones >> Universidad Nacional del Sur >> http://www.dgt.uns.edu.ar >> Tel/Fax: (54)-291-4595166 >> Tel: (54)-291-4595101 int. 2050 >> Avda. Alem 1253, (B8000CPB) Bah?a Blanca - Argentina >> ---------------------------------------------------- >> >> >> El 27/09/2011 08:07 PM, Gus Correa escribi?: >>> Hi Fernando >>> >>> Did you try something like this in your >>> ${TORQUE}/server_priv/nodes file? >>> >>> frontend np=12 [skip this line if the frontend is not to do job work] >>> node1 np=12 >>> node2 np=12 >>> node3 np=12 >>> node4 np=12 >>> >>> This is probably the first thing to do. >>> It is not Maui, just plain Torque [actually pbs_server configuration]. >>> >>> The lines above assume your nodes are called node1, ... >>> and the head node is called frontend, >>> in some name-resolvable manner [most likely >>> in your /etc/hosts file, most likely pointing to the nodes' >>> IP addresses in your cluster's private subnet, 192.168.X.X, >>> 10.X.X.X or equivalent]. >>> >>> The 'np=12' clause will allow at most 12 *processes* per node. >>> >>> >>> [However, if VASP is *threaded*, say via OpenMP, then it won't >>> prevent that several threads are launched from each process. >>> To handle threaded you can use some tricks, such as requesting >>> more cores than processes. >>> Sorry, I am not familiar to VASP to be able to say more than this.] >>> >>> I would suggest that you take a look at the Torque Admin Manual >>> for more details: >>> http://www.adaptivecomputing.com/resources/docs/torque/ >>> >>> There are further controls in Maui, such as >>> 'JOBNODEMATCHPOLICY EXACTNODE' in maui.cfg, >>> for instance, if you want full nodes allocated to each job, >>> as opposed to jobs sharing cores in a single node. >>> However, these choices may come later. >>> [You can change maui.cfg and restart the maui scheduler to >>> test various changes.] >>> >>> For Maui details see the Maui Admin Guide: >>> http://www.adaptivecomputing.com/resources/docs/maui/index.php >>> >>> I hope this helps, >>> Gus Correa >>> >>> Fernando Caba wrote: >>>> Hi every body, i am using torque 3.0.1 and maui 3.3.1 in a configuration >>>> composed by a front end and 4 nodes (2 processors, 6 cores each) >>>> totalizing 48 cores. >>>> I need to configure that in each node don?t run no more than 12 process >>>> (particular we are using vasp), so we wan?t no more than 12 vasp process >>>> by node. >>>> How can i configure this? I?m so confusing reading a lot of information >>>> from torque and maui configuration. >>>> >>>> Thank?s in advance. >>>> >>> _______________________________________________ >>> mauiusers mailing list >>> mauiusers at supercluster.org >>> http://www.supercluster.org/mailman/listinfo/mauiusers >>> >> _______________________________________________ >> mauiusers mailing list >> mauiusers at supercluster.org >> http://www.supercluster.org/mailman/listinfo/mauiusers > _______________________________________________ > mauiusers mailing list > mauiusers at supercluster.org > http://www.supercluster.org/mailman/listinfo/mauiusers > From gus at ldeo.columbia.edu Wed Sep 28 13:07:44 2011 From: gus at ldeo.columbia.edu (Gus Correa) Date: Wed, 28 Sep 2011 15:07:44 -0400 Subject: [Mauiusers] =?iso-8859-1?q?Can=B4t_get_busy_nodes?= In-Reply-To: <4E8369AA.3030201@uns.edu.ar> References: <4E824CF2.2080204@uns.edu.ar> <4E825736.8080103@ldeo.columbia.edu> <4E829149.7060309@uns.edu.ar> <4E833E4E.50702@ldeo.columbia.edu> <4E8369AA.3030201@uns.edu.ar> Message-ID: <4E837080.3050602@ldeo.columbia.edu> Hi Fernando Did you restart maui after you changed maui.cfg? [service maui restart] Any chances that what you see is still residual from old jobs, submitted before you changed the maui configuration and job scripts [#PBS -l nodes=1:ppn=12]? For more help from everybody in the list, it may be useful if you send the output of: qmgr -c 'p s' ${TORQUE}/bin/pbsnodes ${MAUI}/bin/showconfig ps -ef |grep maui service maui status service pbs_server status service pbs_sched status [just in case it is also running ...] service pbs_mom status service pbs status I hope this helps, Gus Correa Fernando Caba wrote: > Hi everybody, thanks for all answers. > I try all that you point out: > > including > #PBS -l nodes=1:ppn=12 > > adding > > JOBNODEMATCHPOLICY EXACTNODE > > to maui.cfg > > but nothing of this work. I?m thinking that the problem is in another > config parameter (maui or torque). > > I will reading more about all. > > Thanks!! > > ---------------------------------------------------- > Ing. Fernando Caba > Director General de Telecomunicaciones > Universidad Nacional del Sur > http://www.dgt.uns.edu.ar > Tel/Fax: (54)-291-4595166 > Tel: (54)-291-4595101 int. 2050 > Avda. Alem 1253, (B8000CPB) Bah?a Blanca - Argentina > ---------------------------------------------------- > > > El 28/09/2011 12:33 PM, Gus Correa escribi?: >> Hi Fernando >> >> Dennis already pointed out the first/main problem. >> Your Torque/PBS script is not requesting a specific number of nodes >> and cores/processors. >> You can ask for 12 processors, even if your MPI command doesn't >> use all of them: >> >> #PBS -l nodes=1:ppn=12 >> >> [You can still do mpirun -np 8 if you want.] >> >> This will prevent two jobs to run in the same node [which seems >> to be your goal, if I understood it right]. >> >> I like to add also the queue name [even if it is the default] >> and the job name [for documentation and stdout/stderr >> naming consistency] >> >> #PBS -q myqueue [whatever you called your queue] >> #PBS -N myjob [15 characters at most, the rest gets truncated] >> >> The #PBS clauses must be together and right after the #! /bin/sh line. >> >> Ask your users to always add these lines to their jobs. >> There is a feature of torque that allows you to write a wrapper >> that will whatever you want to the job script, >> but if your pool of users is small >> you can just ask them to cooperate. >> >> Of course there is much more that you can add. >> 'man qsub' and 'man pbs_resources' are good sources of information, >> highly recommended reading. >> >> >> Then there is what Antonio Messina mentioned, the cpuset feature >> of Torque. >> I don't know if you installed Torque with this feature enabled. >> However, if you did, it will allow the specific cores to be >> assigned to each process, which could allow node-sharing without >> jobs stepping on each other toes. >> However: >> A) this requires a bit more of setup [not a lot, check the >> list archives and the Torque Admin Guide] >> B) if your users are cooperative and request 12 processors for each job, >> and you're using the Maui 'JOBNODEMATCHPOLICY EXACTNODE' each job will >> get to a single node anyway. >> >> BTW, did you restart Maui after you added 'JOBNODEMATCHPOLICY EXACTNODE' >> to the maui.cfg file? >> >> I hope this helps, >> Gus Correa >> >> >> Fernando Caba wrote: >>> Hi Gus, my node file /var/spool/torque /server_priv/nodes looks like: >>> >>> [root at fe server_priv]# more nodes >>> n10 np=12 >>> n11 np=12 >>> n12 np=12 >>> n13 np=12 >>> [root at fe server_priv]# >>> >>> it is exact as your comment. >>> >>> My script: >>> >>> #!/bin/bash >>> >>> cd $PBS_O_WORKDIR >>> >>> mpirun -np 8 /usr/local/vasp/vasp >>> >>> launch 8 vasp in one node. If i start one job more (with -np 8), >>> the job will run in the same node (n13). >>> So if i start another job with -np 8 >>> (or -np 4), it will run in the same node n13. >>> >>> I configured JOBNODEMATCHPOLICY EXACTNODE in maui.cfg, >>> but unfortunately the ran in node n13. >>> This is an example of the output of top >>> >>> top - 00:05:53 up 14 days, 6:47, 1 user, load average: 4.18, 4.06, 4.09 >>> Mem: 15955108k total, 13287888k used, 2667220k free, 142168k buffers >>> Swap: 67111528k total, 16672k used, 67094856k free, 11360332k cached >>> >>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND >>> 21796 patricia 25 0 463m 291m 12m R 100.5 1.9 517:29.59 vasp >>> 21797 patricia 25 0 448m 276m 11m R 100.2 1.8 518:51.49 vasp >>> 21798 patricia 25 0 458m 287m 11m R 100.2 1.8 522:01.79 vasp >>> 21799 patricia 25 0 448m 276m 11m R 99.9 1.8 519:04.25 vasp >>> 1 root 15 0 10348 672 568 S 0.0 0.0 0:00.53 init >>> 2 root RT -5 0 0 0 S 0.0 0.0 0:00.06 migration/0 >>> 3 root 34 19 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/0 >>> 4 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/0 >>> 5 root RT -5 0 0 0 S 0.0 0.0 0:00.04 migration/1 >>> >>> The job that generate those 4 vasp process is: >>> >>> #!/bin/bash >>> >>> cd $PBS_O_WORKDIR >>> >>> mpirun -np 4 /usr/local/vasp/vasp >>> >>> Thanks >>> >>> ---------------------------------------------------- >>> Ing. Fernando Caba >>> Director General de Telecomunicaciones >>> Universidad Nacional del Sur >>> http://www.dgt.uns.edu.ar >>> Tel/Fax: (54)-291-4595166 >>> Tel: (54)-291-4595101 int. 2050 >>> Avda. Alem 1253, (B8000CPB) Bah?a Blanca - Argentina >>> ---------------------------------------------------- >>> >>> >>> El 27/09/2011 08:07 PM, Gus Correa escribi?: >>>> Hi Fernando >>>> >>>> Did you try something like this in your >>>> ${TORQUE}/server_priv/nodes file? >>>> >>>> frontend np=12 [skip this line if the frontend is not to do job work] >>>> node1 np=12 >>>> node2 np=12 >>>> node3 np=12 >>>> node4 np=12 >>>> >>>> This is probably the first thing to do. >>>> It is not Maui, just plain Torque [actually pbs_server configuration]. >>>> >>>> The lines above assume your nodes are called node1, ... >>>> and the head node is called frontend, >>>> in some name-resolvable manner [most likely >>>> in your /etc/hosts file, most likely pointing to the nodes' >>>> IP addresses in your cluster's private subnet, 192.168.X.X, >>>> 10.X.X.X or equivalent]. >>>> >>>> The 'np=12' clause will allow at most 12 *processes* per node. >>>> >>>> >>>> [However, if VASP is *threaded*, say via OpenMP, then it won't >>>> prevent that several threads are launched from each process. >>>> To handle threaded you can use some tricks, such as requesting >>>> more cores than processes. >>>> Sorry, I am not familiar to VASP to be able to say more than this.] >>>> >>>> I would suggest that you take a look at the Torque Admin Manual >>>> for more details: >>>> http://www.adaptivecomputing.com/resources/docs/torque/ >>>> >>>> There are further controls in Maui, such as >>>> 'JOBNODEMATCHPOLICY EXACTNODE' in maui.cfg, >>>> for instance, if you want full nodes allocated to each job, >>>> as opposed to jobs sharing cores in a single node. >>>> However, these choices may come later. >>>> [You can change maui.cfg and restart the maui scheduler to >>>> test various changes.] >>>> >>>> For Maui details see the Maui Admin Guide: >>>> http://www.adaptivecomputing.com/resources/docs/maui/index.php >>>> >>>> I hope this helps, >>>> Gus Correa >>>> >>>> Fernando Caba wrote: >>>>> Hi every body, i am using torque 3.0.1 and maui 3.3.1 in a configuration >>>>> composed by a front end and 4 nodes (2 processors, 6 cores each) >>>>> totalizing 48 cores. >>>>> I need to configure that in each node don?t run no more than 12 process >>>>> (particular we are using vasp), so we wan?t no more than 12 vasp process >>>>> by node. >>>>> How can i configure this? I?m so confusing reading a lot of information >>>>> from torque and maui configuration. >>>>> >>>>> Thank?s in advance. >>>>> >>>> _______________________________________________ >>>> mauiusers mailing list >>>> mauiusers at supercluster.org >>>> http://www.supercluster.org/mailman/listinfo/mauiusers >>>> >>> _______________________________________________ >>> mauiusers mailing list >>> mauiusers at supercluster.org >>> http://www.supercluster.org/mailman/listinfo/mauiusers >> _______________________________________________ >> mauiusers mailing list >> mauiusers at supercluster.org >> http://www.supercluster.org/mailman/listinfo/mauiusers >> > _______________________________________________ > mauiusers mailing list > mauiusers at supercluster.org > http://www.supercluster.org/mailman/listinfo/mauiusers From jasonw at Jhu.edu Wed Sep 28 17:42:59 2011 From: jasonw at Jhu.edu (Jason Williams) Date: Wed, 28 Sep 2011 19:42:59 -0400 Subject: [Mauiusers] maui limits? looking for experience In-Reply-To: <20110928164052.5a05183a@amarrosa.pic.es> References: <20110928164052.5a05183a@amarrosa.pic.es> Message-ID: <4E83B103.6070007@Jhu.edu> I've noticed similar things when my cluster gets loaded too. I find it annoying that if maui gets behind, and "misses" scheduler iterations, because it's working on high job turn around, it has to catch up on the missed iterations. Also, while maui is scheduling things, there is what appears to be a type of global "lock" or block on all communications to maui. So if you get very busy, and start missing many iterations, it can sometimes be over 30 minutes to over an hour before maui starts responding again. To users, this may look like a deadlock, but really, when you look at the logs, maui is just going nuts trying to catch up. I've been meaning to look at the code to figure out what the heck is going on, but I haven't had time. Basically, that's my long winded way of saying "I have seen this too, Arnau." And that I don't really have a good way around it aside from setting limitations as another member suggested. -- Jason Williams Sr. Systems Administrator Homewood HPC Cluster Johns Hopkins University On 9/28/2011 10:40 AM, Arnau Bria wrote: > Hi all, > > we've been using torque/maui for a long time. Our initial cluster was > about 50 nodes and now ~350 with 3k processors. > > It has been working fine since last cluster upgrade, when we added > last 500 processors. Since then, maui client commands hang and we had > to increase poll interval cause scheduling cycle took too much... Now, > with a system with 3k running jobs and 3k in queue, we're facing more > maui issues... > > So, we were wondering which are maui limits, if we have reached any of > them and if anyone who already reached our limits could share his > experience, on solving them, with us. > > we're running maui-3.3-1.x86_64. > > > Many thanks in advance, > Cheers, > Arnau > _______________________________________________ > mauiusers mailing list > mauiusers at supercluster.org > http://www.supercluster.org/mailman/listinfo/mauiusers From gus at ldeo.columbia.edu Wed Sep 28 17:56:56 2011 From: gus at ldeo.columbia.edu (Gus Correa) Date: Wed, 28 Sep 2011 19:56:56 -0400 Subject: [Mauiusers] maui limits? looking for experience In-Reply-To: <4E83B103.6070007@Jhu.edu> References: <20110928164052.5a05183a@amarrosa.pic.es> <4E83B103.6070007@Jhu.edu> Message-ID: <4E83B448.6080200@ldeo.columbia.edu> Hi Arnau, Jason Well, I guess I should consider myself happy to administer only small clusters. :) Now, how about the [terse] guidance in the Maui Admin Guide for large clusters? http://www.adaptivecomputing.com/resources/docs/maui/a.ilargeclusters.php And the [slightly more verbose] one for Torque: http://www.adaptivecomputing.com/resources/docs/torque/a.flargeclusters.php Would them help with scalability? Cheers, Gus Correa Jason Williams wrote: > I've noticed similar things when my cluster gets loaded too. I find it > annoying that if maui gets behind, and "misses" scheduler iterations, > because it's working on high job turn around, it has to catch up on the > missed iterations. Also, while maui is scheduling things, there is what > appears to be a type of global "lock" or block on all communications to > maui. So if you get very busy, and start missing many iterations, it > can sometimes be over 30 minutes to over an hour before maui starts > responding again. To users, this may look like a deadlock, but really, > when you look at the logs, maui is just going nuts trying to catch up. > > I've been meaning to look at the code to figure out what the heck is > going on, but I haven't had time. > > Basically, that's my long winded way of saying "I have seen this too, > Arnau." And that I don't really have a good way around it aside from > setting limitations as another member suggested. > > -- > Jason Williams > Sr. Systems Administrator > Homewood HPC Cluster > Johns Hopkins University > > On 9/28/2011 10:40 AM, Arnau Bria wrote: >> Hi all, >> >> we've been using torque/maui for a long time. Our initial cluster was >> about 50 nodes and now ~350 with 3k processors. >> >> It has been working fine since last cluster upgrade, when we added >> last 500 processors. Since then, maui client commands hang and we had >> to increase poll interval cause scheduling cycle took too much... Now, >> with a system with 3k running jobs and 3k in queue, we're facing more >> maui issues... >> >> So, we were wondering which are maui limits, if we have reached any of >> them and if anyone who already reached our limits could share his >> experience, on solving them, with us. >> >> we're running maui-3.3-1.x86_64. >> >> >> Many thanks in advance, >> Cheers, >> Arnau >> _______________________________________________ >> mauiusers mailing list >> mauiusers at supercluster.org >> http://www.supercluster.org/mailman/listinfo/mauiusers > > _______________________________________________ > mauiusers mailing list > mauiusers at supercluster.org > http://www.supercluster.org/mailman/listinfo/mauiusers From jayavant.patil82 at gmail.com Thu Sep 29 00:58:56 2011 From: jayavant.patil82 at gmail.com (Jayavant Patil) Date: Thu, 29 Sep 2011 12:28:56 +0530 Subject: [Mauiusers] Job Priority Change w.r.t. queue Message-ID: Hi, Is it possible to change the priority of a job w.r.t. the queue in which it currently resides? -- Thanks & Regards, Jayavant N. Patil -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.supercluster.org/pipermail/mauiusers/attachments/20110929/e57ca8ba/attachment.html From arnaubria at pic.es Thu Sep 29 03:30:26 2011 From: arnaubria at pic.es (Arnau Bria) Date: Thu, 29 Sep 2011 11:30:26 +0200 Subject: [Mauiusers] maui limits? looking for experience In-Reply-To: <4E83B448.6080200@ldeo.columbia.edu> References: <20110928164052.5a05183a@amarrosa.pic.es> <4E83B103.6070007@Jhu.edu> <4E83B448.6080200@ldeo.columbia.edu> Message-ID: <20110929113026.776724c8@amarrosa.pic.es> On Wed, 28 Sep 2011 19:56:56 -0400 Gus Correa wrote: > Hi Arnau, Jason Hi Gus, > Well, I guess I should consider myself happy > to administer only small clusters. :) > > Now, how about the [terse] guidance in the Maui Admin Guide for large > clusters? > http://www.adaptivecomputing.com/resources/docs/maui/a.ilargeclusters.php I have many doubts about those params, maybe it's time to ask about them :-) NODEPOLLFREQUENCY: with a RMPOLLINT of 1 minute and NODEPOLL to 3, during those 3 minutes that maui is not going to ask about node status, if a node goes from busy to free on minute 1, maui is not going to schedule jobs there until the 3 scheduling cycle starts... is that correct? JOBAGGREGATIONTIME: I don't really understand what this paramater does, but it talks about burtsy submission, not about long queues. > And the [slightly more verbose] one for Torque: > http://www.adaptivecomputing.com/resources/docs/torque/a.flargeclusters.php Some time ago we did configure all those params (ping/check rate and tcp_timeout) and torque works fine. But, from torque point of view, 350 nodes is not a "big cluster", so it scales fine. > Would them help with scalability? Till now, limiting idle queue improved maui behaviour.... > Cheers, > Gus Correa Cheers, Arnau From arnaubria at pic.es Thu Sep 29 03:55:33 2011 From: arnaubria at pic.es (Arnau Bria) Date: Thu, 29 Sep 2011 11:55:33 +0200 Subject: [Mauiusers] maui limits? looking for experience In-Reply-To: <4E833A26.8070900@rqchp.qc.ca> References: <20110928164052.5a05183a@amarrosa.pic.es> <4E833A26.8070900@rqchp.qc.ca> Message-ID: <20110929115533.666d68a5@amarrosa.pic.es> On Wed, 28 Sep 2011 11:15:50 -0400 Michel B?land wrote: > Hi, Hi, > I would advise defining a limit on idle jobs per user. For example: > > USERCFG[DEFAULT] MAXIJOB=200 > > or any suitable number for you site. This really improves maui behaviour. But limiting idle queue was the last thing I wanted to do.... > Alternatively, Torque has a per-queue max_user_queuable setting, but > it counts both running and queued jobs. If you use a route queue to > route your job to an execution queue, you can define this for the > execution queue and jobs will be moved to the execution queue only > when the limit is respected. If I understand routing queues properly, they send jobs based on job required resources. our jobs do not require any special resource, our users send jobs based on queue name that show time limits. So, I think that routing queues can't help here. > Both solutions should decrease the load on Maui as it does not need > to schedule as many jobs at a time. > Many thanks for your reply, Cheers, Arnau From fcaba at uns.edu.ar Thu Sep 29 06:27:15 2011 From: fcaba at uns.edu.ar (Fernando Caba) Date: Thu, 29 Sep 2011 09:27:15 -0300 Subject: [Mauiusers] =?iso-8859-1?q?Can=B4t_get_busy_nodes?= In-Reply-To: <4E837080.3050602@ldeo.columbia.edu> References: <4E824CF2.2080204@uns.edu.ar> <4E825736.8080103@ldeo.columbia.edu> <4E829149.7060309@uns.edu.ar> <4E833E4E.50702@ldeo.columbia.edu> <4E8369AA.3030201@uns.edu.ar> <4E837080.3050602@ldeo.columbia.edu> Message-ID: <4E846423.9050208@uns.edu.ar> Hi Gus, here are the results of all commands you mention: [root at fe ~]# qmgr -c 'p s' # # Create queues and set their attributes. # # # Create and define queue batch # create queue batch set queue batch queue_type = Execution set queue batch resources_default.nodes = 1 set queue batch resources_default.walltime = 2400:00:00 set queue batch enabled = True set queue batch started = True # # Set server attributes. # set server scheduling = True set server acl_hosts = fe set server managers = root at fe set server operators = root at fe set server default_queue = batch set server log_events = 511 set server mail_from = adm set server scheduler_iteration = 600 set server node_check_rate = 150 set server tcp_timeout = 6 set server mom_job_sync = True set server keep_completed = 300 set server auto_node_np = True set server next_job_number = 182 set server record_job_info = True [root at fe ~]# ${TORQUE}/bin/pbsnodes [root at fe ~]# pbsnodes n10 state = free np = 12 ntype = cluster jobs = 0/121.fe status = rectime=1317298640,varattr=,jobs=121.fe,state=free,netload=261129374581,gres=,loadave=4.00,ncpus=12,physmem=16360208kb,availmem=62484756kb,totmem=83471736kb,idletime=63369,nusers=2,nsessions=2,sessions=4394 8087,uname=Linux n10 2.6.18-194.el5 #1 SMP Fri Apr 2 14:58:14 EDT 2010 x86_64,opsys=linux mom_service_port = 15002 mom_manager_port = 15003 gpus = 0 n11 state = free np = 12 ntype = cluster jobs = 0/143.fe status = rectime=1317298637,varattr=,jobs=143.fe,state=free,netload=12864227236,gres=,loadave=8.00,ncpus=12,physmem=16360208kb,availmem=78708424kb,totmem=83469060kb,idletime=1354314,nusers=2,nsessions=2,sessions=4583 20253,uname=Linux n11 2.6.18-194.el5 #1 SMP Fri Apr 2 14:58:14 EDT 2010 x86_64,opsys=linux mom_service_port = 15002 mom_manager_port = 15003 gpus = 0 n12 state = free np = 12 ntype = cluster jobs = 0/144.fe status = rectime=1317298647,varattr=,jobs=144.fe,state=free,netload=953102292987,gres=,loadave=8.01,ncpus=12,physmem=16360208kb,availmem=78740696kb,totmem=83469060kb,idletime=1168354,nusers=2,nsessions=2,sessions=4635 20289,uname=Linux n12 2.6.18-194.el5 #1 SMP Fri Apr 2 14:58:14 EDT 2010 x86_64,opsys=linux mom_service_port = 15002 mom_manager_port = 15003 gpus = 0 n13 state = free np = 12 ntype = cluster jobs = 0/181.fe status = rectime=1317298672,varattr=,jobs=181.fe,state=free,netload=1010169147229,gres=,loadave=4.00,ncpus=12,physmem=15955108kb,availmem=81150100kb,totmem=83066636kb,idletime=138726,nusers=2,nsessions=2,sessions=4407 29186,uname=Linux n13 2.6.18-194.el5xen #1 SMP Fri Apr 2 15:34:40 EDT 2010 x86_64,opsys=linux mom_service_port = 15002 mom_manager_port = 15003 gpus = 0 [root at fe ~]# ${MAUI}/bin/showconfig [root at fe ~]# which showconfig /usr/local/maui/bin/showconfig [root at fe ~]# showconfig # Maui version 3.3.1 (PID: 18407) # global policies REJECTNEGPRIOJOBS[0] FALSE ENABLENEGJOBPRIORITY[0] FALSE ENABLEMULTINODEJOBS[0] TRUE ENABLEMULTIREQJOBS[0] FALSE BFPRIORITYPOLICY[0] [NONE] JOBPRIOACCRUALPOLICY QUEUEPOLICY NODELOADPOLICY ADJUSTSTATE USEMACHINESPEEDFORFS FALSE USEMACHINESPEED FALSE USESYSTEMQUEUETIME TRUE USELOCALMACHINEPRIORITY FALSE NODEUNTRACKEDLOADFACTOR 1.2 JOBNODEMATCHPOLICY[0] EXACTNODE JOBMAXSTARTTIME[0] INFINITY METAMAXTASKS[0] 0 NODESETPOLICY[0] [NONE] NODESETATTRIBUTE[0] [NONE] NODESETLIST[0] NODESETDELAY[0] 00:00:00 NODESETPRIORITYTYPE[0] MINLOSS NODESETTOLERANCE[0] 0.00 BACKFILLPOLICY[0] FIRSTFIT BACKFILLDEPTH[0] 0 BACKFILLPROCFACTOR[0] 0 BACKFILLMAXSCHEDULES[0] 10000 BACKFILLMETRIC[0] PROCS BFCHUNKDURATION[0] 00:00:00 BFCHUNKSIZE[0] 0 PREEMPTPOLICY[0] REQUEUE MINADMINSTIME[0] 00:00:00 RESOURCELIMITPOLICY[0] NODEAVAILABILITYPOLICY[0] COMBINED:[DEFAULT] NODEALLOCATIONPOLICY[0] MINRESOURCE TASKDISTRIBUTIONPOLICY[0] DEFAULT RESERVATIONPOLICY[0] CURRENTHIGHEST RESERVATIONRETRYTIME[0] 00:00:00 RESERVATIONTHRESHOLDTYPE[0] NONE RESERVATIONTHRESHOLDVALUE[0] 0 FSPOLICY [NONE] FSPOLICY [NONE] FSINTERVAL 12:00:00 FSDEPTH 8 FSDECAY 1.00 # Priority Weights SERVICEWEIGHT[0] 1 TARGETWEIGHT[0] 1 CREDWEIGHT[0] 1 ATTRWEIGHT[0] 1 FSWEIGHT[0] 1 RESWEIGHT[0] 1 USAGEWEIGHT[0] 1 QUEUETIMEWEIGHT[0] 1 XFACTORWEIGHT[0] 0 SPVIOLATIONWEIGHT[0] 0 BYPASSWEIGHT[0] 0 TARGETQUEUETIMEWEIGHT[0] 0 TARGETXFACTORWEIGHT[0] 0 USERWEIGHT[0] 0 GROUPWEIGHT[0] 0 ACCOUNTWEIGHT[0] 0 QOSWEIGHT[0] 0 CLASSWEIGHT[0] 0 FSUSERWEIGHT[0] 0 FSGROUPWEIGHT[0] 0 FSACCOUNTWEIGHT[0] 0 FSQOSWEIGHT[0] 0 FSCLASSWEIGHT[0] 0 ATTRATTRWEIGHT[0] 0 ATTRSTATEWEIGHT[0] 0 NODEWEIGHT[0] 0 PROCWEIGHT[0] 0 MEMWEIGHT[0] 0 SWAPWEIGHT[0] 0 DISKWEIGHT[0] 0 PSWEIGHT[0] 0 PEWEIGHT[0] 0 WALLTIMEWEIGHT[0] 0 UPROCWEIGHT[0] 0 UJOBWEIGHT[0] 0 CONSUMEDWEIGHT[0] 0 USAGEEXECUTIONTIMEWEIGHT[0] 0 REMAININGWEIGHT[0] 0 PERCENTWEIGHT[0] 0 XFMINWCLIMIT[0] 00:02:00 # partition DEFAULT policies REJECTNEGPRIOJOBS[1] FALSE ENABLENEGJOBPRIORITY[1] FALSE ENABLEMULTINODEJOBS[1] TRUE ENABLEMULTIREQJOBS[1] FALSE BFPRIORITYPOLICY[1] [NONE] JOBPRIOACCRUALPOLICY QUEUEPOLICY NODELOADPOLICY ADJUSTSTATE JOBNODEMATCHPOLICY[1] JOBMAXSTARTTIME[1] INFINITY METAMAXTASKS[1] 0 NODESETPOLICY[1] [NONE] NODESETATTRIBUTE[1] [NONE] NODESETLIST[1] NODESETDELAY[1] 00:00:00 NODESETPRIORITYTYPE[1] MINLOSS NODESETTOLERANCE[1] 0.00 # Priority Weights XFMINWCLIMIT[1] 00:00:00 RMAUTHTYPE[0] CHECKSUM CLASSCFG[[NONE]] DEFAULT.FEATURES=[NONE] CLASSCFG[[ALL]] DEFAULT.FEATURES=[NONE] CLASSCFG[batch] DEFAULT.FEATURES=[NONE] QOSPRIORITY[0] 0 QOSQTWEIGHT[0] 0 QOSXFWEIGHT[0] 0 QOSTARGETXF[0] 0.00 QOSTARGETQT[0] 00:00:00 QOSFLAGS[0] QOSPRIORITY[1] 0 QOSQTWEIGHT[1] 0 QOSXFWEIGHT[1] 0 QOSTARGETXF[1] 0.00 QOSTARGETQT[1] 00:00:00 QOSFLAGS[1] # SERVER MODULES: MX SERVERMODE NORMAL SERVERNAME SERVERHOST fe SERVERPORT 42559 LOGFILE maui.log LOGFILEMAXSIZE 10000000 LOGFILEROLLDEPTH 1 LOGLEVEL 3 LOGFACILITY fALL SERVERHOMEDIR /usr/local/maui/ TOOLSDIR /usr/local/maui/tools/ LOGDIR /usr/local/maui/log/ STATDIR /usr/local/maui/stats/ LOCKFILE /usr/local/maui/maui.pid SERVERCONFIGFILE /usr/local/maui/maui.cfg CHECKPOINTFILE /usr/local/maui/maui.ck CHECKPOINTINTERVAL 00:05:00 CHECKPOINTEXPIRATIONTIME 3:11:20:00 TRAPJOB TRAPNODE TRAPFUNCTION RESDEPTH 24 RMPOLLINTERVAL 00:00:30 NODEACCESSPOLICY SHARED ALLOCLOCALITYPOLICY [NONE] SIMTIMEPOLICY [NONE] ADMIN1 root ADMINHOSTS ALL NODEPOLLFREQUENCY 0 DISPLAYFLAGS DEFAULTDOMAIN DEFAULTCLASSLIST [DEFAULT:1] FEATURENODETYPEHEADER FEATUREPROCSPEEDHEADER FEATUREPARTITIONHEADER DEFERTIME 1:00:00 DEFERCOUNT 24 DEFERSTARTCOUNT 1 JOBPURGETIME 0 NODEPURGETIME 2140000000 APIFAILURETHRESHHOLD 6 NODESYNCTIME 600 JOBSYNCTIME 600 JOBMAXOVERRUN 00:10:00 NODEMAXLOAD 0.0 PLOTMINTIME 120 PLOTMAXTIME 245760 PLOTTIMESCALE 11 PLOTMINPROC 1 PLOTMAXPROC 512 PLOTPROCSCALE 9 SCHEDCFG[] MODE=NORMAL SERVER=fe:42559 # RM MODULES: PBS SSS WIKI NATIVE RMCFG[FE] AUTHTYPE=CHECKSUM EPORT=15004 TIMEOUT=00:00:09 TYPE=PBS SIMWORKLOADTRACEFILE workload SIMRESOURCETRACEFILE resource SIMAUTOSHUTDOWN OFF SIMSTARTTIME 0 SIMSCALEJOBRUNTIME FALSE SIMFLAGS SIMJOBSUBMISSIONPOLICY CONSTANTJOBDEPTH SIMINITIALQUEUEDEPTH 16 SIMWCACCURACY 0.00 SIMWCACCURACYCHANGE 0.00 SIMNODECOUNT 0 SIMNODECONFIGURATION NORMAL SIMWCSCALINGPERCENT 100 SIMCOMRATE 0.10 SIMCOMTYPE ROUNDROBIN COMINTRAFRAMECOST 0.30 COMINTERFRAMECOST 0.30 SIMSTOPITERATION -1 SIMEXITITERATION -1 [root at fe ~]# ps -ef |grep maui root 18407 1 0 Sep28 ? 00:00:04 /usr/local/maui/sbin/maui root 22527 22463 0 09:19 pts/2 00:00:00 grep maui [root at fe ~]# service maui status maui (pid 18407) is running... [root at fe ~]# service pbs_server status pbs_server (pid 4147) is running... [root at fe ~]# service pbs_sched status [just in case it is also running ...] service pbs_mom status service pbs status none of those 3 services are installed Thank you very much ---------------------------------------------------- Ing. Fernando Caba Director General de Telecomunicaciones Universidad Nacional del Sur http://www.dgt.uns.edu.ar Tel/Fax: (54)-291-4595166 Tel: (54)-291-4595101 int. 2050 Avda. Alem 1253, (B8000CPB) Bah?a Blanca - Argentina ---------------------------------------------------- El 28/09/2011 04:07 PM, Gus Correa escribi?: > Hi Fernando > > Did you restart maui after you changed maui.cfg? [service maui restart] > > Any chances that what you see is still residual from old jobs, > submitted before you changed the maui configuration and job scripts > [#PBS -l nodes=1:ppn=12]? > > For more help from everybody in the list, > it may be useful if you send the output of: > > qmgr -c 'p s' > > ${TORQUE}/bin/pbsnodes > > ${MAUI}/bin/showconfig > > ps -ef |grep maui > > service maui status > service pbs_server status > service pbs_sched status [just in case it is also running ...] > service pbs_mom status > service pbs status > > I hope this helps, > Gus Correa > > > Fernando Caba wrote: >> Hi everybody, thanks for all answers. >> I try all that you point out: >> >> including >> #PBS -l nodes=1:ppn=12 >> >> adding >> >> JOBNODEMATCHPOLICY EXACTNODE >> >> to maui.cfg >> >> but nothing of this work. I?m thinking that the problem is in another >> config parameter (maui or torque). >> >> I will reading more about all. >> >> Thanks!! >> >> ---------------------------------------------------- >> Ing. Fernando Caba >> Director General de Telecomunicaciones >> Universidad Nacional del Sur >> http://www.dgt.uns.edu.ar >> Tel/Fax: (54)-291-4595166 >> Tel: (54)-291-4595101 int. 2050 >> Avda. Alem 1253, (B8000CPB) Bah?a Blanca - Argentina >> ---------------------------------------------------- >> >> >> El 28/09/2011 12:33 PM, Gus Correa escribi?: >>> Hi Fernando >>> >>> Dennis already pointed out the first/main problem. >>> Your Torque/PBS script is not requesting a specific number of nodes >>> and cores/processors. >>> You can ask for 12 processors, even if your MPI command doesn't >>> use all of them: >>> >>> #PBS -l nodes=1:ppn=12 >>> >>> [You can still do mpirun -np 8 if you want.] >>> >>> This will prevent two jobs to run in the same node [which seems >>> to be your goal, if I understood it right]. >>> >>> I like to add also the queue name [even if it is the default] >>> and the job name [for documentation and stdout/stderr >>> naming consistency] >>> >>> #PBS -q myqueue [whatever you called your queue] >>> #PBS -N myjob [15 characters at most, the rest gets truncated] >>> >>> The #PBS clauses must be together and right after the #! /bin/sh line. >>> >>> Ask your users to always add these lines to their jobs. >>> There is a feature of torque that allows you to write a wrapper >>> that will whatever you want to the job script, >>> but if your pool of users is small >>> you can just ask them to cooperate. >>> >>> Of course there is much more that you can add. >>> 'man qsub' and 'man pbs_resources' are good sources of information, >>> highly recommended reading. >>> >>> >>> Then there is what Antonio Messina mentioned, the cpuset feature >>> of Torque. >>> I don't know if you installed Torque with this feature enabled. >>> However, if you did, it will allow the specific cores to be >>> assigned to each process, which could allow node-sharing without >>> jobs stepping on each other toes. >>> However: >>> A) this requires a bit more of setup [not a lot, check the >>> list archives and the Torque Admin Guide] >>> B) if your users are cooperative and request 12 processors for each job, >>> and you're using the Maui 'JOBNODEMATCHPOLICY EXACTNODE' each job will >>> get to a single node anyway. >>> >>> BTW, did you restart Maui after you added 'JOBNODEMATCHPOLICY EXACTNODE' >>> to the maui.cfg file? >>> >>> I hope this helps, >>> Gus Correa >>> >>> >>> Fernando Caba wrote: >>>> Hi Gus, my node file /var/spool/torque /server_priv/nodes looks like: >>>> >>>> [root at fe server_priv]# more nodes >>>> n10 np=12 >>>> n11 np=12 >>>> n12 np=12 >>>> n13 np=12 >>>> [root at fe server_priv]# >>>> >>>> it is exact as your comment. >>>> >>>> My script: >>>> >>>> #!/bin/bash >>>> >>>> cd $PBS_O_WORKDIR >>>> >>>> mpirun -np 8 /usr/local/vasp/vasp >>>> >>>> launch 8 vasp in one node. If i start one job more (with -np 8), >>>> the job will run in the same node (n13). >>>> So if i start another job with -np 8 >>>> (or -np 4), it will run in the same node n13. >>>> >>>> I configured JOBNODEMATCHPOLICY EXACTNODE in maui.cfg, >>>> but unfortunately the ran in node n13. >>>> This is an example of the output of top >>>> >>>> top - 00:05:53 up 14 days, 6:47, 1 user, load average: 4.18, 4.06, 4.09 >>>> Mem: 15955108k total, 13287888k used, 2667220k free, 142168k buffers >>>> Swap: 67111528k total, 16672k used, 67094856k free, 11360332k cached >>>> >>>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND >>>> 21796 patricia 25 0 463m 291m 12m R 100.5 1.9 517:29.59 vasp >>>> 21797 patricia 25 0 448m 276m 11m R 100.2 1.8 518:51.49 vasp >>>> 21798 patricia 25 0 458m 287m 11m R 100.2 1.8 522:01.79 vasp >>>> 21799 patricia 25 0 448m 276m 11m R 99.9 1.8 519:04.25 vasp >>>> 1 root 15 0 10348 672 568 S 0.0 0.0 0:00.53 init >>>> 2 root RT -5 0 0 0 S 0.0 0.0 0:00.06 migration/0 >>>> 3 root 34 19 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/0 >>>> 4 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/0 >>>> 5 root RT -5 0 0 0 S 0.0 0.0 0:00.04 migration/1 >>>> >>>> The job that generate those 4 vasp process is: >>>> >>>> #!/bin/bash >>>> >>>> cd $PBS_O_WORKDIR >>>> >>>> mpirun -np 4 /usr/local/vasp/vasp >>>> >>>> Thanks >>>> >>>> ---------------------------------------------------- >>>> Ing. Fernando Caba >>>> Director General de Telecomunicaciones >>>> Universidad Nacional del Sur >>>> http://www.dgt.uns.edu.ar >>>> Tel/Fax: (54)-291-4595166 >>>> Tel: (54)-291-4595101 int. 2050 >>>> Avda. Alem 1253, (B8000CPB) Bah?a Blanca - Argentina >>>> ---------------------------------------------------- >>>> >>>> >>>> El 27/09/2011 08:07 PM, Gus Correa escribi?: >>>>> Hi Fernando >>>>> >>>>> Did you try something like this in your >>>>> ${TORQUE}/server_priv/nodes file? >>>>> >>>>> frontend np=12 [skip this line if the frontend is not to do job work] >>>>> node1 np=12 >>>>> node2 np=12 >>>>> node3 np=12 >>>>> node4 np=12 >>>>> >>>>> This is probably the first thing to do. >>>>> It is not Maui, just plain Torque [actually pbs_server configuration]. >>>>> >>>>> The lines above assume your nodes are called node1, ... >>>>> and the head node is called frontend, >>>>> in some name-resolvable manner [most likely >>>>> in your /etc/hosts file, most likely pointing to the nodes' >>>>> IP addresses in your cluster's private subnet, 192.168.X.X, >>>>> 10.X.X.X or equivalent]. >>>>> >>>>> The 'np=12' clause will allow at most 12 *processes* per node. >>>>> >>>>> >>>>> [However, if VASP is *threaded*, say via OpenMP, then it won't >>>>> prevent that several threads are launched from each process. >>>>> To handle threaded you can use some tricks, such as requesting >>>>> more cores than processes. >>>>> Sorry, I am not familiar to VASP to be able to say more than this.] >>>>> >>>>> I would suggest that you take a look at the Torque Admin Manual >>>>> for more details: >>>>> http://www.adaptivecomputing.com/resources/docs/torque/ >>>>> >>>>> There are further controls in Maui, such as >>>>> 'JOBNODEMATCHPOLICY EXACTNODE' in maui.cfg, >>>>> for instance, if you want full nodes allocated to each job, >>>>> as opposed to jobs sharing cores in a single node. >>>>> However, these choices may come later. >>>>> [You can change maui.cfg and restart the maui scheduler to >>>>> test various changes.] >>>>> >>>>> For Maui details see the Maui Admin Guide: >>>>> http://www.adaptivecomputing.com/resources/docs/maui/index.php >>>>> >>>>> I hope this helps, >>>>> Gus Correa >>>>> >>>>> Fernando Caba wrote: >>>>>> Hi every body, i am using torque 3.0.1 and maui 3.3.1 in a configuration >>>>>> composed by a front end and 4 nodes (2 processors, 6 cores each) >>>>>> totalizing 48 cores. >>>>>> I need to configure that in each node don?t run no more than 12 process >>>>>> (particular we are using vasp), so we wan?t no more than 12 vasp process >>>>>> by node. >>>>>> How can i configure this? I?m so confusing reading a lot of information >>>>>> from torque and maui configuration. >>>>>> >>>>>> Thank?s in advance. >>>>>> >>>>> _______________________________________________ >>>>> mauiusers mailing list >>>>> mauiusers at supercluster.org >>>>> http://www.supercluster.org/mailman/listinfo/mauiusers >>>>> >>>> _______________________________________________ >>>> mauiusers mailing list >>>> mauiusers at supercluster.org >>>> http://www.supercluster.org/mailman/listinfo/mauiusers >>> _______________________________________________ >>> mauiusers mailing list >>> mauiusers at supercluster.org >>> http://www.supercluster.org/mailman/listinfo/mauiusers >>> >> _______________________________________________ >> mauiusers mailing list >> mauiusers at supercluster.org >> http://www.supercluster.org/mailman/listinfo/mauiusers > _______________________________________________ > mauiusers mailing list > mauiusers at supercluster.org > http://www.supercluster.org/mailman/listinfo/mauiusers > From gus at ldeo.columbia.edu Thu Sep 29 12:45:32 2011 From: gus at ldeo.columbia.edu (Gus Correa) Date: Thu, 29 Sep 2011 14:45:32 -0400 Subject: [Mauiusers] =?iso-8859-1?q?Can=B4t_get_busy_nodes?= In-Reply-To: <4E846423.9050208@uns.edu.ar> References: <4E824CF2.2080204@uns.edu.ar> <4E825736.8080103@ldeo.columbia.edu> <4E829149.7060309@uns.edu.ar> <4E833E4E.50702@ldeo.columbia.edu> <4E8369AA.3030201@uns.edu.ar> <4E837080.3050602@ldeo.columbia.edu> <4E846423.9050208@uns.edu.ar> Message-ID: <4E84BCCC.2040005@ldeo.columbia.edu> Hi Fernando I can't find any smoking gun, but hopefully other eyeballs in the list may find. Here are a things I found different, which may or may not matter: First, in my torque server I don't have auto_node_np enabled. I don't know if it matters for your specific problem, but I guess it has a flimsy chance to interfere with the maui-pbs_server interaction. 'Man pbs_server_attributes' is a little terse about that [check it yourself]. In the Torque version I run the default is disabled, and I use the default. Since you have a static [and small, 4 nodes only] server_priv/nodes file, I guess it won't hurt to disable auto_node_np to see if it solves the problem: qmgr -c 'set server auto_node_np = False' My guess is that this item is meant specially for large clusters, where nodes can be inserted or removed, and with this enabled the server will dynamically adjust to the nodes available. Second, I don't have the server attribute 'record_job_info = True', but this may not really matter. Third, I can see you're running Maui as 'root' [ADMIN1 root] We have a 'maui' user here in this role instead. My 'maui' user is also a manager and operator in the Torque server, along with root. Again, this may not matter, and root alone may be able to play both roles. Finally, you could also check your maui and pbs_server logs to see if there is some problem reported there, some error or warning message that would give a clue about what is going on. I hope this helps, Gus Correa Fernando Caba wrote: > Hi Gus, here are the results of all commands you mention: > > [root at fe ~]# qmgr -c 'p s' > # > # Create queues and set their attributes. > # > # > # Create and define queue batch > # > create queue batch > set queue batch queue_type = Execution > set queue batch resources_default.nodes = 1 > set queue batch resources_default.walltime = 2400:00:00 > set queue batch enabled = True > set queue batch started = True > # > # Set server attributes. > # > set server scheduling = True > set server acl_hosts = fe > set server managers = root at fe > set server operators = root at fe > set server default_queue = batch > set server log_events = 511 > set server mail_from = adm > set server scheduler_iteration = 600 > set server node_check_rate = 150 > set server tcp_timeout = 6 > set server mom_job_sync = True > set server keep_completed = 300 > set server auto_node_np = True > set server next_job_number = 182 > set server record_job_info = True > [root at fe ~]# > > > ${TORQUE}/bin/pbsnodes > > [root at fe ~]# pbsnodes > n10 > state = free > np = 12 > ntype = cluster > jobs = 0/121.fe > status = > rectime=1317298640,varattr=,jobs=121.fe,state=free,netload=261129374581,gres=,loadave=4.00,ncpus=12,physmem=16360208kb,availmem=62484756kb,totmem=83471736kb,idletime=63369,nusers=2,nsessions=2,sessions=4394 > 8087,uname=Linux n10 2.6.18-194.el5 #1 SMP Fri Apr 2 14:58:14 EDT 2010 > x86_64,opsys=linux > mom_service_port = 15002 > mom_manager_port = 15003 > gpus = 0 > > n11 > state = free > np = 12 > ntype = cluster > jobs = 0/143.fe > status = > rectime=1317298637,varattr=,jobs=143.fe,state=free,netload=12864227236,gres=,loadave=8.00,ncpus=12,physmem=16360208kb,availmem=78708424kb,totmem=83469060kb,idletime=1354314,nusers=2,nsessions=2,sessions=4583 > 20253,uname=Linux n11 2.6.18-194.el5 #1 SMP Fri Apr 2 14:58:14 EDT 2010 > x86_64,opsys=linux > mom_service_port = 15002 > mom_manager_port = 15003 > gpus = 0 > > n12 > state = free > np = 12 > ntype = cluster > jobs = 0/144.fe > status = > rectime=1317298647,varattr=,jobs=144.fe,state=free,netload=953102292987,gres=,loadave=8.01,ncpus=12,physmem=16360208kb,availmem=78740696kb,totmem=83469060kb,idletime=1168354,nusers=2,nsessions=2,sessions=4635 > 20289,uname=Linux n12 2.6.18-194.el5 #1 SMP Fri Apr 2 14:58:14 EDT 2010 > x86_64,opsys=linux > mom_service_port = 15002 > mom_manager_port = 15003 > gpus = 0 > > n13 > state = free > np = 12 > ntype = cluster > jobs = 0/181.fe > status = > rectime=1317298672,varattr=,jobs=181.fe,state=free,netload=1010169147229,gres=,loadave=4.00,ncpus=12,physmem=15955108kb,availmem=81150100kb,totmem=83066636kb,idletime=138726,nusers=2,nsessions=2,sessions=4407 > 29186,uname=Linux n13 2.6.18-194.el5xen #1 SMP Fri Apr 2 15:34:40 EDT > 2010 x86_64,opsys=linux > mom_service_port = 15002 > mom_manager_port = 15003 > gpus = 0 > > [root at fe ~]# > > ${MAUI}/bin/showconfig > > [root at fe ~]# which showconfig > /usr/local/maui/bin/showconfig > [root at fe ~]# showconfig > # Maui version 3.3.1 (PID: 18407) > # global policies > > REJECTNEGPRIOJOBS[0] FALSE > ENABLENEGJOBPRIORITY[0] FALSE > ENABLEMULTINODEJOBS[0] TRUE > ENABLEMULTIREQJOBS[0] FALSE > BFPRIORITYPOLICY[0] [NONE] > JOBPRIOACCRUALPOLICY QUEUEPOLICY > NODELOADPOLICY ADJUSTSTATE > USEMACHINESPEEDFORFS FALSE > USEMACHINESPEED FALSE > USESYSTEMQUEUETIME TRUE > USELOCALMACHINEPRIORITY FALSE > NODEUNTRACKEDLOADFACTOR 1.2 > JOBNODEMATCHPOLICY[0] EXACTNODE > > JOBMAXSTARTTIME[0] INFINITY > > METAMAXTASKS[0] 0 > NODESETPOLICY[0] [NONE] > NODESETATTRIBUTE[0] [NONE] > NODESETLIST[0] > NODESETDELAY[0] 00:00:00 > NODESETPRIORITYTYPE[0] MINLOSS > NODESETTOLERANCE[0] 0.00 > > BACKFILLPOLICY[0] FIRSTFIT > BACKFILLDEPTH[0] 0 > BACKFILLPROCFACTOR[0] 0 > BACKFILLMAXSCHEDULES[0] 10000 > BACKFILLMETRIC[0] PROCS > > BFCHUNKDURATION[0] 00:00:00 > BFCHUNKSIZE[0] 0 > PREEMPTPOLICY[0] REQUEUE > MINADMINSTIME[0] 00:00:00 > RESOURCELIMITPOLICY[0] > NODEAVAILABILITYPOLICY[0] COMBINED:[DEFAULT] > NODEALLOCATIONPOLICY[0] MINRESOURCE > TASKDISTRIBUTIONPOLICY[0] DEFAULT > RESERVATIONPOLICY[0] CURRENTHIGHEST > RESERVATIONRETRYTIME[0] 00:00:00 > RESERVATIONTHRESHOLDTYPE[0] NONE > RESERVATIONTHRESHOLDVALUE[0] 0 > > FSPOLICY [NONE] > FSPOLICY [NONE] > FSINTERVAL 12:00:00 > FSDEPTH 8 > FSDECAY 1.00 > > > > # Priority Weights > > SERVICEWEIGHT[0] 1 > TARGETWEIGHT[0] 1 > CREDWEIGHT[0] 1 > ATTRWEIGHT[0] 1 > FSWEIGHT[0] 1 > RESWEIGHT[0] 1 > USAGEWEIGHT[0] 1 > QUEUETIMEWEIGHT[0] 1 > XFACTORWEIGHT[0] 0 > SPVIOLATIONWEIGHT[0] 0 > BYPASSWEIGHT[0] 0 > TARGETQUEUETIMEWEIGHT[0] 0 > TARGETXFACTORWEIGHT[0] 0 > USERWEIGHT[0] 0 > GROUPWEIGHT[0] 0 > ACCOUNTWEIGHT[0] 0 > QOSWEIGHT[0] 0 > CLASSWEIGHT[0] 0 > FSUSERWEIGHT[0] 0 > FSGROUPWEIGHT[0] 0 > FSACCOUNTWEIGHT[0] 0 > FSQOSWEIGHT[0] 0 > FSCLASSWEIGHT[0] 0 > ATTRATTRWEIGHT[0] 0 > ATTRSTATEWEIGHT[0] 0 > NODEWEIGHT[0] 0 > PROCWEIGHT[0] 0 > MEMWEIGHT[0] 0 > SWAPWEIGHT[0] 0 > DISKWEIGHT[0] 0 > PSWEIGHT[0] 0 > PEWEIGHT[0] 0 > WALLTIMEWEIGHT[0] 0 > UPROCWEIGHT[0] 0 > UJOBWEIGHT[0] 0 > CONSUMEDWEIGHT[0] 0 > USAGEEXECUTIONTIMEWEIGHT[0] 0 > REMAININGWEIGHT[0] 0 > PERCENTWEIGHT[0] 0 > XFMINWCLIMIT[0] 00:02:00 > > > # partition DEFAULT policies > > REJECTNEGPRIOJOBS[1] FALSE > ENABLENEGJOBPRIORITY[1] FALSE > ENABLEMULTINODEJOBS[1] TRUE > ENABLEMULTIREQJOBS[1] FALSE > BFPRIORITYPOLICY[1] [NONE] > JOBPRIOACCRUALPOLICY QUEUEPOLICY > NODELOADPOLICY ADJUSTSTATE > JOBNODEMATCHPOLICY[1] > > JOBMAXSTARTTIME[1] INFINITY > > METAMAXTASKS[1] 0 > NODESETPOLICY[1] [NONE] > NODESETATTRIBUTE[1] [NONE] > NODESETLIST[1] > NODESETDELAY[1] 00:00:00 > NODESETPRIORITYTYPE[1] MINLOSS > NODESETTOLERANCE[1] 0.00 > > # Priority Weights > > XFMINWCLIMIT[1] 00:00:00 > > RMAUTHTYPE[0] CHECKSUM > > CLASSCFG[[NONE]] DEFAULT.FEATURES=[NONE] > CLASSCFG[[ALL]] DEFAULT.FEATURES=[NONE] > CLASSCFG[batch] DEFAULT.FEATURES=[NONE] > QOSPRIORITY[0] 0 > QOSQTWEIGHT[0] 0 > QOSXFWEIGHT[0] 0 > QOSTARGETXF[0] 0.00 > QOSTARGETQT[0] 00:00:00 > QOSFLAGS[0] > QOSPRIORITY[1] 0 > QOSQTWEIGHT[1] 0 > QOSXFWEIGHT[1] 0 > QOSTARGETXF[1] 0.00 > QOSTARGETQT[1] 00:00:00 > QOSFLAGS[1] > # SERVER MODULES: MX > SERVERMODE NORMAL > SERVERNAME > SERVERHOST fe > SERVERPORT 42559 > LOGFILE maui.log > LOGFILEMAXSIZE 10000000 > LOGFILEROLLDEPTH 1 > LOGLEVEL 3 > LOGFACILITY fALL > SERVERHOMEDIR /usr/local/maui/ > TOOLSDIR /usr/local/maui/tools/ > LOGDIR /usr/local/maui/log/ > STATDIR /usr/local/maui/stats/ > LOCKFILE /usr/local/maui/maui.pid > SERVERCONFIGFILE /usr/local/maui/maui.cfg > CHECKPOINTFILE /usr/local/maui/maui.ck > CHECKPOINTINTERVAL 00:05:00 > CHECKPOINTEXPIRATIONTIME 3:11:20:00 > TRAPJOB > TRAPNODE > TRAPFUNCTION > RESDEPTH 24 > > RMPOLLINTERVAL 00:00:30 > NODEACCESSPOLICY SHARED > ALLOCLOCALITYPOLICY [NONE] > SIMTIMEPOLICY [NONE] > ADMIN1 root > ADMINHOSTS ALL > NODEPOLLFREQUENCY 0 > DISPLAYFLAGS > DEFAULTDOMAIN > DEFAULTCLASSLIST [DEFAULT:1] > FEATURENODETYPEHEADER > FEATUREPROCSPEEDHEADER > FEATUREPARTITIONHEADER > DEFERTIME 1:00:00 > DEFERCOUNT 24 > DEFERSTARTCOUNT 1 > JOBPURGETIME 0 > NODEPURGETIME 2140000000 > APIFAILURETHRESHHOLD 6 > NODESYNCTIME 600 > JOBSYNCTIME 600 > JOBMAXOVERRUN 00:10:00 > NODEMAXLOAD 0.0 > > PLOTMINTIME 120 > PLOTMAXTIME 245760 > PLOTTIMESCALE 11 > PLOTMINPROC 1 > PLOTMAXPROC 512 > PLOTPROCSCALE 9 > SCHEDCFG[] MODE=NORMAL SERVER=fe:42559 > # RM MODULES: PBS SSS WIKI NATIVE > RMCFG[FE] AUTHTYPE=CHECKSUM EPORT=15004 TIMEOUT=00:00:09 TYPE=PBS > SIMWORKLOADTRACEFILE workload > SIMRESOURCETRACEFILE resource > SIMAUTOSHUTDOWN OFF > SIMSTARTTIME 0 > SIMSCALEJOBRUNTIME FALSE > SIMFLAGS > SIMJOBSUBMISSIONPOLICY CONSTANTJOBDEPTH > SIMINITIALQUEUEDEPTH 16 > SIMWCACCURACY 0.00 > SIMWCACCURACYCHANGE 0.00 > SIMNODECOUNT 0 > SIMNODECONFIGURATION NORMAL > SIMWCSCALINGPERCENT 100 > SIMCOMRATE 0.10 > SIMCOMTYPE ROUNDROBIN > COMINTRAFRAMECOST 0.30 > COMINTERFRAMECOST 0.30 > SIMSTOPITERATION -1 > SIMEXITITERATION -1 > > > > [root at fe ~]# ps -ef |grep maui > root 18407 1 0 Sep28 ? 00:00:04 /usr/local/maui/sbin/maui > root 22527 22463 0 09:19 pts/2 00:00:00 grep maui > [root at fe ~]# service maui status > maui (pid 18407) is running... > [root at fe ~]# service pbs_server status > pbs_server (pid 4147) is running... > [root at fe ~]# > > service pbs_sched status [just in case it is also running ...] > service pbs_mom status > service pbs status > > none of those 3 services are installed > > Thank you very much > > ---------------------------------------------------- > Ing. Fernando Caba > Director General de Telecomunicaciones > Universidad Nacional del Sur > http://www.dgt.uns.edu.ar > Tel/Fax: (54)-291-4595166 > Tel: (54)-291-4595101 int. 2050 > Avda. Alem 1253, (B8000CPB) Bah?a Blanca - Argentina > ---------------------------------------------------- > > > El 28/09/2011 04:07 PM, Gus Correa escribi?: >> Hi Fernando >> >> Did you restart maui after you changed maui.cfg? [service maui restart] >> >> Any chances that what you see is still residual from old jobs, >> submitted before you changed the maui configuration and job scripts >> [#PBS -l nodes=1:ppn=12]? >> >> For more help from everybody in the list, >> it may be useful if you send the output of: >> >> qmgr -c 'p s' >> >> ${TORQUE}/bin/pbsnodes >> >> ${MAUI}/bin/showconfig >> >> ps -ef |grep maui >> >> service maui status >> service pbs_server status >> service pbs_sched status [just in case it is also running ...] >> service pbs_mom status >> service pbs status >> >> I hope this helps, >> Gus Correa >> >> >> Fernando Caba wrote: >>> Hi everybody, thanks for all answers. >>> I try all that you point out: >>> >>> including >>> #PBS -l nodes=1:ppn=12 >>> >>> adding >>> >>> JOBNODEMATCHPOLICY EXACTNODE >>> >>> to maui.cfg >>> >>> but nothing of this work. I?m thinking that the problem is in another >>> config parameter (maui or torque). >>> >>> I will reading more about all. >>> >>> Thanks!! >>> >>> ---------------------------------------------------- >>> Ing. Fernando Caba >>> Director General de Telecomunicaciones >>> Universidad Nacional del Sur >>> http://www.dgt.uns.edu.ar >>> Tel/Fax: (54)-291-4595166 >>> Tel: (54)-291-4595101 int. 2050 >>> Avda. Alem 1253, (B8000CPB) Bah?a Blanca - Argentina >>> ---------------------------------------------------- >>> >>> >>> El 28/09/2011 12:33 PM, Gus Correa escribi?: >>>> Hi Fernando >>>> >>>> Dennis already pointed out the first/main problem. >>>> Your Torque/PBS script is not requesting a specific number of nodes >>>> and cores/processors. >>>> You can ask for 12 processors, even if your MPI command doesn't >>>> use all of them: >>>> >>>> #PBS -l nodes=1:ppn=12 >>>> >>>> [You can still do mpirun -np 8 if you want.] >>>> >>>> This will prevent two jobs to run in the same node [which seems >>>> to be your goal, if I understood it right]. >>>> >>>> I like to add also the queue name [even if it is the default] >>>> and the job name [for documentation and stdout/stderr >>>> naming consistency] >>>> >>>> #PBS -q myqueue [whatever you called your queue] >>>> #PBS -N myjob [15 characters at most, the rest gets truncated] >>>> >>>> The #PBS clauses must be together and right after the #! /bin/sh line. >>>> >>>> Ask your users to always add these lines to their jobs. >>>> There is a feature of torque that allows you to write a wrapper >>>> that will whatever you want to the job script, >>>> but if your pool of users is small >>>> you can just ask them to cooperate. >>>> >>>> Of course there is much more that you can add. >>>> 'man qsub' and 'man pbs_resources' are good sources of information, >>>> highly recommended reading. >>>> >>>> >>>> Then there is what Antonio Messina mentioned, the cpuset feature >>>> of Torque. >>>> I don't know if you installed Torque with this feature enabled. >>>> However, if you did, it will allow the specific cores to be >>>> assigned to each process, which could allow node-sharing without >>>> jobs stepping on each other toes. >>>> However: >>>> A) this requires a bit more of setup [not a lot, check the >>>> list archives and the Torque Admin Guide] >>>> B) if your users are cooperative and request 12 processors for each job, >>>> and you're using the Maui 'JOBNODEMATCHPOLICY EXACTNODE' each job will >>>> get to a single node anyway. >>>> >>>> BTW, did you restart Maui after you added 'JOBNODEMATCHPOLICY EXACTNODE' >>>> to the maui.cfg file? >>>> >>>> I hope this helps, >>>> Gus Correa >>>> >>>> >>>> Fernando Caba wrote: >>>>> Hi Gus, my node file /var/spool/torque /server_priv/nodes looks like: >>>>> >>>>> [root at fe server_priv]# more nodes >>>>> n10 np=12 >>>>> n11 np=12 >>>>> n12 np=12 >>>>> n13 np=12 >>>>> [root at fe server_priv]# >>>>> >>>>> it is exact as your comment. >>>>> >>>>> My script: >>>>> >>>>> #!/bin/bash >>>>> >>>>> cd $PBS_O_WORKDIR >>>>> >>>>> mpirun -np 8 /usr/local/vasp/vasp >>>>> >>>>> launch 8 vasp in one node. If i start one job more (with -np 8), >>>>> the job will run in the same node (n13). >>>>> So if i start another job with -np 8 >>>>> (or -np 4), it will run in the same node n13. >>>>> >>>>> I configured JOBNODEMATCHPOLICY EXACTNODE in maui.cfg, >>>>> but unfortunately the ran in node n13. >>>>> This is an example of the output of top >>>>> >>>>> top - 00:05:53 up 14 days, 6:47, 1 user, load average: 4.18, 4.06, 4.09 >>>>> Mem: 15955108k total, 13287888k used, 2667220k free, 142168k buffers >>>>> Swap: 67111528k total, 16672k used, 67094856k free, 11360332k cached >>>>> >>>>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND >>>>> 21796 patricia 25 0 463m 291m 12m R 100.5 1.9 517:29.59 vasp >>>>> 21797 patricia 25 0 448m 276m 11m R 100.2 1.8 518:51.49 vasp >>>>> 21798 patricia 25 0 458m 287m 11m R 100.2 1.8 522:01.79 vasp >>>>> 21799 patricia 25 0 448m 276m 11m R 99.9 1.8 519:04.25 vasp >>>>> 1 root 15 0 10348 672 568 S 0.0 0.0 0:00.53 init >>>>> 2 root RT -5 0 0 0 S 0.0 0.0 0:00.06 migration/0 >>>>> 3 root 34 19 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/0 >>>>> 4 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/0 >>>>> 5 root RT -5 0 0 0 S 0.0 0.0 0:00.04 migration/1 >>>>> >>>>> The job that generate those 4 vasp process is: >>>>> >>>>> #!/bin/bash >>>>> >>>>> cd $PBS_O_WORKDIR >>>>> >>>>> mpirun -np 4 /usr/local/vasp/vasp >>>>> >>>>> Thanks >>>>> >>>>> ---------------------------------------------------- >>>>> Ing. Fernando Caba >>>>> Director General de Telecomunicaciones >>>>> Universidad Nacional del Sur >>>>> http://www.dgt.uns.edu.ar >>>>> Tel/Fax: (54)-291-4595166 >>>>> Tel: (54)-291-4595101 int. 2050 >>>>> Avda. Alem 1253, (B8000CPB) Bah?a Blanca - Argentina >>>>> ---------------------------------------------------- >>>>> >>>>> >>>>> El 27/09/2011 08:07 PM, Gus Correa escribi?: >>>>>> Hi Fernando >>>>>> >>>>>> Did you try something like this in your >>>>>> ${TORQUE}/server_priv/nodes file? >>>>>> >>>>>> frontend np=12 [skip this line if the frontend is not to do job work] >>>>>> node1 np=12 >>>>>> node2 np=12 >>>>>> node3 np=12 >>>>>> node4 np=12 >>>>>> >>>>>> This is probably the first thing to do. >>>>>> It is not Maui, just plain Torque [actually pbs_server configuration]. >>>>>> >>>>>> The lines above assume your nodes are called node1, ... >>>>>> and the head node is called frontend, >>>>>> in some name-resolvable manner [most likely >>>>>> in your /etc/hosts file, most likely pointing to the nodes' >>>>>> IP addresses in your cluster's private subnet, 192.168.X.X, >>>>>> 10.X.X.X or equivalent]. >>>>>> >>>>>> The 'np=12' clause will allow at most 12 *processes* per node. >>>>>> >>>>>> >>>>>> [However, if VASP is *threaded*, say via OpenMP, then it won't >>>>>> prevent that several threads are launched from each process. >>>>>> To handle threaded you can use some tricks, such as requesting >>>>>> more cores than processes. >>>>>> Sorry, I am not familiar to VASP to be able to say more than this.] >>>>>> >>>>>> I would suggest that you take a look at the Torque Admin Manual >>>>>> for more details: >>>>>> http://www.adaptivecomputing.com/resources/docs/torque/ >>>>>> >>>>>> There are further controls in Maui, such as >>>>>> 'JOBNODEMATCHPOLICY EXACTNODE' in maui.cfg, >>>>>> for instance, if you want full nodes allocated to each job, >>>>>> as opposed to jobs sharing cores in a single node. >>>>>> However, these choices may come later. >>>>>> [You can change maui.cfg and restart the maui scheduler to >>>>>> test various changes.] >>>>>> >>>>>> For Maui details see the Maui Admin Guide: >>>>>> http://www.adaptivecomputing.com/resources/docs/maui/index.php >>>>>> >>>>>> I hope this helps, >>>>>> Gus Correa >>>>>> >>>>>> Fernando Caba wrote: >>>>>>> Hi every body, i am using torque 3.0.1 and maui 3.3.1 in a configuration >>>>>>> composed by a front end and 4 nodes (2 processors, 6 cores each) >>>>>>> totalizing 48 cores. >>>>>>> I need to configure that in each node don?t run no more than 12 process >>>>>>> (particular we are using vasp), so we wan?t no more than 12 vasp process >>>>>>> by node. >>>>>>> How can i configure this? I?m so confusing reading a lot of information >>>>>>> from torque and maui configuration. >>>>>>> >>>>>>> Thank?s in advance. >>>>>>> >>>>>> _______________________________________________ >>>>>> mauiusers mailing list >>>>>> mauiusers at supercluster.org >>>>>> http://www.supercluster.org/mailman/listinfo/mauiusers >>>>>> >>>>> _______________________________________________ >>>>> mauiusers mailing list >>>>> mauiusers at supercluster.org >>>>> http://www.supercluster.org/mailman/listinfo/mauiusers >>>> _______________________________________________ >>>> mauiusers mailing list >>>> mauiusers at supercluster.org >>>> http://www.supercluster.org/mailman/listinfo/mauiusers >>>> >>> _______________________________________________ >>> mauiusers mailing list >>> mauiusers at supercluster.org >>> http://www.supercluster.org/mailman/listinfo/mauiusers >> _______________________________________________ >> mauiusers mailing list >> mauiusers at supercluster.org >> http://www.supercluster.org/mailman/listinfo/mauiusers >> > _______________________________________________ > mauiusers mailing list > mauiusers at supercluster.org > http://www.supercluster.org/mailman/listinfo/mauiusers From denismpa at gmail.com Thu Sep 29 13:01:17 2011 From: denismpa at gmail.com (Denis) Date: Thu, 29 Sep 2011 16:01:17 -0300 Subject: [Mauiusers] =?iso-8859-1?q?Can=B4t_get_busy_nodes?= In-Reply-To: <4E84BCCC.2040005@ldeo.columbia.edu> References: <4E824CF2.2080204@uns.edu.ar> <4E825736.8080103@ldeo.columbia.edu> <4E829149.7060309@uns.edu.ar> <4E833E4E.50702@ldeo.columbia.edu> <4E8369AA.3030201@uns.edu.ar> <4E837080.3050602@ldeo.columbia.edu> <4E846423.9050208@uns.edu.ar> <4E84BCCC.2040005@ldeo.columbia.edu> Message-ID: > Fernando Caba wrote: > > Hi Gus, here are the results of all commands you mention: > > > > [root at fe ~]# qmgr -c 'p s' > > # > > # Create queues and set their attributes. > > # > > # > > # Create and define queue batch > > # > > create queue batch > > set queue batch queue_type = Execution > > set queue batch resources_default.nodes = 1 > > set queue batch resources_default.walltime = 2400:00:00 > > set queue batch enabled = True > > set queue batch started = True > > # > > # Set server attributes. > > # > > set server scheduling = True > > set server acl_hosts = fe > > set server managers = root at fe > > set server operators = root at fe > > set server default_queue = batch > > set server log_events = 511 > > set server mail_from = adm > > set server scheduler_iteration = 600 > > set server node_check_rate = 150 > > set server tcp_timeout = 6 > > set server mom_job_sync = True > > set server keep_completed = 300 > > set server auto_node_np = True > > set server next_job_number = 182 > > set server record_job_info = True > > [root at fe ~]# > > > > > > ${TORQUE}/bin/pbsnodes > > > > [root at fe ~]# pbsnodes > > n10 > > state = free > > np = 12 > > ntype = cluster > > jobs = 0/121.fe > > status = > > > rectime=1317298640,varattr=,jobs=121.fe,state=free,netload=261129374581,gres=,loadave=4.00,ncpus=12,physmem=16360208kb,availmem=62484756kb,totmem=83471736kb,idletime=63369,nusers=2,nsessions=2,sessions=4394 > > 8087,uname=Linux n10 2.6.18-194.el5 #1 SMP Fri Apr 2 14:58:14 EDT 2010 > > x86_64,opsys=linux > > mom_service_port = 15002 > > mom_manager_port = 15003 > > gpus = 0 > > > > n11 > > state = free > > np = 12 > > ntype = cluster > > jobs = 0/143.fe > > status = > > > rectime=1317298637,varattr=,jobs=143.fe,state=free,netload=12864227236,gres=,loadave=8.00,ncpus=12,physmem=16360208kb,availmem=78708424kb,totmem=83469060kb,idletime=1354314,nusers=2,nsessions=2,sessions=4583 > > 20253,uname=Linux n11 2.6.18-194.el5 #1 SMP Fri Apr 2 14:58:14 EDT 2010 > > x86_64,opsys=linux > > mom_service_port = 15002 > > mom_manager_port = 15003 > > gpus = 0 > > > > n12 > > state = free > > np = 12 > > ntype = cluster > > jobs = 0/144.fe > > status = > > > rectime=1317298647,varattr=,jobs=144.fe,state=free,netload=953102292987,gres=,loadave=8.01,ncpus=12,physmem=16360208kb,availmem=78740696kb,totmem=83469060kb,idletime=1168354,nusers=2,nsessions=2,sessions=4635 > > 20289,uname=Linux n12 2.6.18-194.el5 #1 SMP Fri Apr 2 14:58:14 EDT 2010 > > x86_64,opsys=linux > > mom_service_port = 15002 > > mom_manager_port = 15003 > > gpus = 0 > > > > n13 > > state = free > > np = 12 > > ntype = cluster > > jobs = 0/181.fe > > status = > > > rectime=1317298672,varattr=,jobs=181.fe,state=free,netload=1010169147229,gres=,loadave=4.00,ncpus=12,physmem=15955108kb,availmem=81150100kb,totmem=83066636kb,idletime=138726,nusers=2,nsessions=2,sessions=4407 > > 29186,uname=Linux n13 2.6.18-194.el5xen #1 SMP Fri Apr 2 15:34:40 EDT > > 2010 x86_64,opsys=linux > > mom_service_port = 15002 > > mom_manager_port = 15003 > > gpus = 0 > > > > [root at fe ~]# > > > > ${MAUI}/bin/showconfig > > > > [root at fe ~]# which showconfig > > /usr/local/maui/bin/showconfig > > [root at fe ~]# showconfig > > # Maui version 3.3.1 (PID: 18407) > > # global policies > > > > REJECTNEGPRIOJOBS[0] FALSE > > ENABLENEGJOBPRIORITY[0] FALSE > > ENABLEMULTINODEJOBS[0] TRUE > > ENABLEMULTIREQJOBS[0] FALSE > > BFPRIORITYPOLICY[0] [NONE] > > JOBPRIOACCRUALPOLICY QUEUEPOLICY > > NODELOADPOLICY ADJUSTSTATE > > USEMACHINESPEEDFORFS FALSE > > USEMACHINESPEED FALSE > > USESYSTEMQUEUETIME TRUE > > USELOCALMACHINEPRIORITY FALSE > > NODEUNTRACKEDLOADFACTOR 1.2 > > JOBNODEMATCHPOLICY[0] EXACTNODE > > > > JOBMAXSTARTTIME[0] INFINITY > > > > METAMAXTASKS[0] 0 > > NODESETPOLICY[0] [NONE] > > NODESETATTRIBUTE[0] [NONE] > > NODESETLIST[0] > > NODESETDELAY[0] 00:00:00 > > NODESETPRIORITYTYPE[0] MINLOSS > > NODESETTOLERANCE[0] 0.00 > > > > BACKFILLPOLICY[0] FIRSTFIT > > BACKFILLDEPTH[0] 0 > > BACKFILLPROCFACTOR[0] 0 > > BACKFILLMAXSCHEDULES[0] 10000 > > BACKFILLMETRIC[0] PROCS > > > > BFCHUNKDURATION[0] 00:00:00 > > BFCHUNKSIZE[0] 0 > > PREEMPTPOLICY[0] REQUEUE > > MINADMINSTIME[0] 00:00:00 > > RESOURCELIMITPOLICY[0] > > NODEAVAILABILITYPOLICY[0] COMBINED:[DEFAULT] > > NODEALLOCATIONPOLICY[0] MINRESOURCE > > TASKDISTRIBUTIONPOLICY[0] DEFAULT > > RESERVATIONPOLICY[0] CURRENTHIGHEST > > RESERVATIONRETRYTIME[0] 00:00:00 > > RESERVATIONTHRESHOLDTYPE[0] NONE > > RESERVATIONTHRESHOLDVALUE[0] 0 > > > > FSPOLICY [NONE] > > FSPOLICY [NONE] > > FSINTERVAL 12:00:00 > > FSDEPTH 8 > > FSDECAY 1.00 > > > > > > > > # Priority Weights > > > > SERVICEWEIGHT[0] 1 > > TARGETWEIGHT[0] 1 > > CREDWEIGHT[0] 1 > > ATTRWEIGHT[0] 1 > > FSWEIGHT[0] 1 > > RESWEIGHT[0] 1 > > USAGEWEIGHT[0] 1 > > QUEUETIMEWEIGHT[0] 1 > > XFACTORWEIGHT[0] 0 > > SPVIOLATIONWEIGHT[0] 0 > > BYPASSWEIGHT[0] 0 > > TARGETQUEUETIMEWEIGHT[0] 0 > > TARGETXFACTORWEIGHT[0] 0 > > USERWEIGHT[0] 0 > > GROUPWEIGHT[0] 0 > > ACCOUNTWEIGHT[0] 0 > > QOSWEIGHT[0] 0 > > CLASSWEIGHT[0] 0 > > FSUSERWEIGHT[0] 0 > > FSGROUPWEIGHT[0] 0 > > FSACCOUNTWEIGHT[0] 0 > > FSQOSWEIGHT[0] 0 > > FSCLASSWEIGHT[0] 0 > > ATTRATTRWEIGHT[0] 0 > > ATTRSTATEWEIGHT[0] 0 > > NODEWEIGHT[0] 0 > > PROCWEIGHT[0] 0 > > MEMWEIGHT[0] 0 > > SWAPWEIGHT[0] 0 > > DISKWEIGHT[0] 0 > > PSWEIGHT[0] 0 > > PEWEIGHT[0] 0 > > WALLTIMEWEIGHT[0] 0 > > UPROCWEIGHT[0] 0 > > UJOBWEIGHT[0] 0 > > CONSUMEDWEIGHT[0] 0 > > USAGEEXECUTIONTIMEWEIGHT[0] 0 > > REMAININGWEIGHT[0] 0 > > PERCENTWEIGHT[0] 0 > > XFMINWCLIMIT[0] 00:02:00 > > > > > > # partition DEFAULT policies > > > > REJECTNEGPRIOJOBS[1] FALSE > > ENABLENEGJOBPRIORITY[1] FALSE > > ENABLEMULTINODEJOBS[1] TRUE > > ENABLEMULTIREQJOBS[1] FALSE > > BFPRIORITYPOLICY[1] [NONE] > > JOBPRIOACCRUALPOLICY QUEUEPOLICY > > NODELOADPOLICY ADJUSTSTATE > > JOBNODEMATCHPOLICY[1] > > > > JOBMAXSTARTTIME[1] INFINITY > > > > METAMAXTASKS[1] 0 > > NODESETPOLICY[1] [NONE] > > NODESETATTRIBUTE[1] [NONE] > > NODESETLIST[1] > > NODESETDELAY[1] 00:00:00 > > NODESETPRIORITYTYPE[1] MINLOSS > > NODESETTOLERANCE[1] 0.00 > > > > # Priority Weights > > > > XFMINWCLIMIT[1] 00:00:00 > > > > RMAUTHTYPE[0] CHECKSUM > > > > CLASSCFG[[NONE]] DEFAULT.FEATURES=[NONE] > > CLASSCFG[[ALL]] DEFAULT.FEATURES=[NONE] > > CLASSCFG[batch] DEFAULT.FEATURES=[NONE] > > QOSPRIORITY[0] 0 > > QOSQTWEIGHT[0] 0 > > QOSXFWEIGHT[0] 0 > > QOSTARGETXF[0] 0.00 > > QOSTARGETQT[0] 00:00:00 > > QOSFLAGS[0] > > QOSPRIORITY[1] 0 > > QOSQTWEIGHT[1] 0 > > QOSXFWEIGHT[1] 0 > > QOSTARGETXF[1] 0.00 > > QOSTARGETQT[1] 00:00:00 > > QOSFLAGS[1] > > # SERVER MODULES: MX > > SERVERMODE NORMAL > > SERVERNAME > > SERVERHOST fe > > SERVERPORT 42559 > > LOGFILE maui.log > > LOGFILEMAXSIZE 10000000 > > LOGFILEROLLDEPTH 1 > > LOGLEVEL 3 > > LOGFACILITY fALL > > SERVERHOMEDIR /usr/local/maui/ > > TOOLSDIR /usr/local/maui/tools/ > > LOGDIR /usr/local/maui/log/ > > STATDIR /usr/local/maui/stats/ > > LOCKFILE /usr/local/maui/maui.pid > > SERVERCONFIGFILE /usr/local/maui/maui.cfg > > CHECKPOINTFILE /usr/local/maui/maui.ck > > CHECKPOINTINTERVAL 00:05:00 > > CHECKPOINTEXPIRATIONTIME 3:11:20:00 > > TRAPJOB > > TRAPNODE > > TRAPFUNCTION > > RESDEPTH 24 > > > > RMPOLLINTERVAL 00:00:30 > > NODEACCESSPOLICY SHARED > > ALLOCLOCALITYPOLICY [NONE] > > SIMTIMEPOLICY [NONE] > > ADMIN1 root > > ADMINHOSTS ALL > > NODEPOLLFREQUENCY 0 > > DISPLAYFLAGS > > DEFAULTDOMAIN > > DEFAULTCLASSLIST [DEFAULT:1] > > FEATURENODETYPEHEADER > > FEATUREPROCSPEEDHEADER > > FEATUREPARTITIONHEADER > > DEFERTIME 1:00:00 > > DEFERCOUNT 24 > > DEFERSTARTCOUNT 1 > > JOBPURGETIME 0 > > NODEPURGETIME 2140000000 > > APIFAILURETHRESHHOLD 6 > > NODESYNCTIME 600 > > JOBSYNCTIME 600 > > JOBMAXOVERRUN 00:10:00 > > NODEMAXLOAD 0.0 > > > > PLOTMINTIME 120 > > PLOTMAXTIME 245760 > > PLOTTIMESCALE 11 > > PLOTMINPROC 1 > > PLOTMAXPROC 512 > > PLOTPROCSCALE 9 > > SCHEDCFG[] MODE=NORMAL SERVER=fe:42559 > > # RM MODULES: PBS SSS WIKI NATIVE > > RMCFG[FE] AUTHTYPE=CHECKSUM EPORT=15004 TIMEOUT=00:00:09 TYPE=PBS > > SIMWORKLOADTRACEFILE workload > > SIMRESOURCETRACEFILE resource > > SIMAUTOSHUTDOWN OFF > > SIMSTARTTIME 0 > > SIMSCALEJOBRUNTIME FALSE > > SIMFLAGS > > SIMJOBSUBMISSIONPOLICY CONSTANTJOBDEPTH > > SIMINITIALQUEUEDEPTH 16 > > SIMWCACCURACY 0.00 > > SIMWCACCURACYCHANGE 0.00 > > SIMNODECOUNT 0 > > SIMNODECONFIGURATION NORMAL > > SIMWCSCALINGPERCENT 100 > > SIMCOMRATE 0.10 > > SIMCOMTYPE ROUNDROBIN > > COMINTRAFRAMECOST 0.30 > > COMINTERFRAMECOST 0.30 > > SIMSTOPITERATION -1 > > SIMEXITITERATION -1 > > > > > > > > [root at fe ~]# ps -ef |grep maui > > root 18407 1 0 Sep28 ? 00:00:04 /usr/local/maui/sbin/maui > > root 22527 22463 0 09:19 pts/2 00:00:00 grep maui > > [root at fe ~]# service maui status > > maui (pid 18407) is running... > > [root at fe ~]# service pbs_server status > > pbs_server (pid 4147) is running... > > [root at fe ~]# > > > > service pbs_sched status [just in case it is also running ...] > > service pbs_mom status > > service pbs status > > > > none of those 3 services are installed > > > > Thank you very much > > > When you shoot a -lnodes=1:ppn=12 against a node, what's the output of pbsnodes for that node? Does it show that 12 cores are in use? When you issue a qstat -f, does it show that your job is really using 12 cores? > > ---------------------------------------------------- > > Ing. Fernando Caba > > Director General de Telecomunicaciones > > Universidad Nacional del Sur > > http://www.dgt.uns.edu.ar > > Tel/Fax: (54)-291-4595166 > > Tel: (54)-291-4595101 int. 2050 > > Avda. Alem 1253, (B8000CPB) Bah?a Blanca - Argentina > > ---------------------------------------------------- > > > cheers, -- Denis Anjos, www.versatushpc.com.br -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.supercluster.org/pipermail/mauiusers/attachments/20110929/ea0c447d/attachment-0001.html From Gareth.Williams at csiro.au Thu Sep 29 15:23:52 2011 From: Gareth.Williams at csiro.au (Gareth.Williams at csiro.au) Date: Fri, 30 Sep 2011 07:23:52 +1000 Subject: [Mauiusers] maui limits? looking for experience In-Reply-To: <20110929115533.666d68a5@amarrosa.pic.es> References: <20110928164052.5a05183a@amarrosa.pic.es> <4E833A26.8070900@rqchp.qc.ca> <20110929115533.666d68a5@amarrosa.pic.es> Message-ID: <007DECE986B47F4EABF823C1FBB19C620102B6D6AE5A@exvic-mbx04.nexus.csiro.au> > -----Original Message----- > From: Arnau Bria [mailto:arnaubria at pic.es] > Sent: Thursday, 29 September 2011 7:56 PM > To: mauiusers at supercluster.org > Subject: Re: [Mauiusers] maui limits? looking for experience > > On Wed, 28 Sep 2011 11:15:50 -0400 > Michel B?land wrote: > > > Hi, > Hi, > > > I would advise defining a limit on idle jobs per user. For example: > > > > USERCFG[DEFAULT] MAXIJOB=200 > > > > or any suitable number for you site. > > This really improves maui behaviour. But limiting idle queue was the > last thing I wanted to do.... Idle limits are mostly good. This mostly limits the number of each users jobs that maui will consider in any scheduling cycle so it make the scheduling cycle shorter/faster. It also limits the priority accumulated by queued jobs and alleviates 'queue stuffing'. I'd recommend idle limits given that maui does not contain a better facility to handle such issues. > > > Alternatively, Torque has a per-queue max_user_queuable setting, but > > it counts both running and queued jobs. If you use a route queue to > > route your job to an execution queue, you can define this for the > > execution queue and jobs will be moved to the execution queue only > > when the limit is respected. > > If I understand routing queues properly, they send jobs based on job > required resources. our jobs do not require any special resource, our > users send jobs based on queue name that show time limits. So, I think > that routing queues can't help here. What is being proposed is that you have a routing queue setup with no special resources, just one routing queue per execution queue (but make it as fancy as you like - though simple is good). Put a limit on the number of (users) jobs in the execution queue(s) (enough to fill the cluster) but allow many jobs in the routing queue(s). Maui only need consider the execution queue so it's job becomes simpler and it can be faster. Gareth (who used maui for some time but doesn't now) > > Both solutions should decrease the load on Maui as it does not need > > to schedule as many jobs at a time. > > > Many thanks for your reply, > Cheers, > Arnau From Gareth.Williams at csiro.au Thu Sep 29 15:29:30 2011 From: Gareth.Williams at csiro.au (Gareth.Williams at csiro.au) Date: Fri, 30 Sep 2011 07:29:30 +1000 Subject: [Mauiusers] maui limits? looking for experience In-Reply-To: <20110929113026.776724c8@amarrosa.pic.es> References: <20110928164052.5a05183a@amarrosa.pic.es> <4E83B103.6070007@Jhu.edu> <4E83B448.6080200@ldeo.columbia.edu> <20110929113026.776724c8@amarrosa.pic.es> Message-ID: <007DECE986B47F4EABF823C1FBB19C620102B6D6AE5B@exvic-mbx04.nexus.csiro.au> I recall that JOBAGGREGATIONTIME is good. Normally torque tells maui to run a scheduling cycle each time a new jobs arrives. This parameter tells maui to hold off cycling too frequently - which is something you want as long as it's not so infrequently that the cluster starts to go idle. Lots of very short jobs are hard work to schedule! Gareth > -----Original Message----- > From: Arnau Bria [mailto:arnaubria at pic.es] > Sent: Thursday, 29 September 2011 7:30 PM > To: mauiusers at supercluster.org > Subject: Re: [Mauiusers] maui limits? looking for experience > > On Wed, 28 Sep 2011 19:56:56 -0400 > Gus Correa wrote: > > > Hi Arnau, Jason > Hi Gus, > > > Well, I guess I should consider myself happy > > to administer only small clusters. :) > > > > Now, how about the [terse] guidance in the Maui Admin Guide for large > > clusters? > > > http://www.adaptivecomputing.com/resources/docs/maui/a.ilargeclusters.p > hp > > I have many doubts about those params, maybe it's time to ask about > them :-) > > NODEPOLLFREQUENCY: with a RMPOLLINT of 1 minute and NODEPOLL to 3, > during those 3 minutes that maui is not going to ask about node status, > if a node goes from busy to free on minute 1, maui is not going to > schedule jobs there until the 3 scheduling cycle starts... is that > correct? > > > JOBAGGREGATIONTIME: I don't really understand what this paramater does, > but it talks about burtsy submission, not about long queues. > > > > And the [slightly more verbose] one for Torque: > > > http://www.adaptivecomputing.com/resources/docs/torque/a.flargeclusters > .php > > Some time ago we did configure all those params (ping/check rate and > tcp_timeout) and torque works fine. But, from torque point of view, > 350 nodes is not a "big cluster", so it scales fine. > > > > Would them help with scalability? > Till now, limiting idle queue improved maui behaviour.... > > > Cheers, > > Gus Correa > Cheers, > Arnau From arnaubria at pic.es Thu Sep 29 16:22:26 2011 From: arnaubria at pic.es (Arnau Bria) Date: Fri, 30 Sep 2011 00:22:26 +0200 Subject: [Mauiusers] maui limits? looking for experience In-Reply-To: <007DECE986B47F4EABF823C1FBB19C620102B6D6AE5B@exvic-mbx04.nexus.csiro.au> References: <20110928164052.5a05183a@amarrosa.pic.es> <4E83B103.6070007@Jhu.edu> <4E83B448.6080200@ldeo.columbia.edu> <20110929113026.776724c8@amarrosa.pic.es> <007DECE986B47F4EABF823C1FBB19C620102B6D6AE5B@exvic-mbx04.nexus.csiro.au> Message-ID: <20110930002226.45a76d64@amparo.bogus.net> On Fri, 30 Sep 2011 07:29:30 +1000 Gareth.Williams at csiro.au Gareth.Williams at csiro.au wrote: > I recall that JOBAGGREGATIONTIME is good. Normally torque tells maui > to run a scheduling cycle each time a new jobs arrives. This > parameter tells maui to hold off cycling too frequently - which is > something you want as long as it's not so infrequently that the > cluster starts to go idle. Lots of very short jobs are hard work to > schedule! thanks fro the explanation. As I said, I did not understand this parameter. > Gareth Many thanks fro your reply, Cheers, Aranau From arnaubria at pic.es Thu Sep 29 16:22:31 2011 From: arnaubria at pic.es (Arnau Bria) Date: Fri, 30 Sep 2011 00:22:31 +0200 Subject: [Mauiusers] maui limits? looking for experience In-Reply-To: <007DECE986B47F4EABF823C1FBB19C620102B6D6AE5A@exvic-mbx04.nexus.csiro.au> References: <20110928164052.5a05183a@amarrosa.pic.es> <4E833A26.8070900@rqchp.qc.ca> <20110929115533.666d68a5@amarrosa.pic.es> <007DECE986B47F4EABF823C1FBB19C620102B6D6AE5A@exvic-mbx04.nexus.csiro.au> Message-ID: <20110930002231.2bb68438@amparo.bogus.net> On Fri, 30 Sep 2011 07:23:52 +1000 Gareth.Williams at csiro.au Gareth.Williams at csiro.au wrote: Hi Gareth, > > This really improves maui behaviour. But limiting idle queue was the > > last thing I wanted to do.... > > Idle limits are mostly good. This mostly limits the number of each > users jobs that maui will consider in any scheduling cycle so it make > the scheduling cycle shorter/faster. It also limits the priority > accumulated by queued jobs and alleviates 'queue stuffing'. I'd > recommend idle limits given that maui does not contain a better > facility to handle such issues. Yep, a short queue reduces maui stress. I completely agree that. Seting a limit of 100 jobs per user leaves a 1k idle queue in normal behaviour, when many user are running jobs. That's the limit I'd use. But, as I've never tried this before, let me ask how maui will behave in this situation: if the farm is 70%, and I have only two users who have submited jobs (user A and B). User A has much more priority than user B, so let's say that the 30% must be filled with 25% of jobs from user A and 5% jobs from user B, if I have 1000 jobs in queue (500 from A and 500 from B) IDLE queue will contain 100 jobs of each user, so each scheduling cycle is going to schedule 200 jobs, is maui going to fill up the farm respecting our policies (25/5)? or is it going to start 100 jobs from each user on each scheduling cycle filling up the farm 15% and 15%? > > If I understand routing queues properly, they send jobs based on job > > required resources. our jobs do not require any special resource, > > our users send jobs based on queue name that show time limits. So, > > I think that routing queues can't help here. > > What is being proposed is that you have a routing queue setup with no > special resources, just one routing queue per execution queue (but > make it as fancy as you like - though simple is good). Put a limit > on the number of (users) jobs in the execution queue(s) (enough to > fill the cluster) but allow many jobs in the routing queue(s). Maui > only need consider the execution queue so it's job becomes simpler > and it can be faster. ok. now I understand. So, "hide" jobs to maui using routing queues. > Gareth (who used maui for some time but doesn't now) I've not said that. I'm just asking for other admin (which much experience) experience. Many thanks for your reply, Cheers, Arnau From arnaubria at pic.es Fri Sep 30 03:31:02 2011 From: arnaubria at pic.es (Arnau Bria) Date: Fri, 30 Sep 2011 11:31:02 +0200 Subject: [Mauiusers] maui limits? looking for experience In-Reply-To: <20110930002231.2bb68438@amparo.bogus.net> References: <20110928164052.5a05183a@amarrosa.pic.es> <4E833A26.8070900@rqchp.qc.ca> <20110929115533.666d68a5@amarrosa.pic.es> <007DECE986B47F4EABF823C1FBB19C620102B6D6AE5A@exvic-mbx04.nexus.csiro.au> <20110930002231.2bb68438@amparo.bogus.net> Message-ID: > Idle limits are mostly good. This mostly limits the number of each > > users jobs that maui will consider in any scheduling cycle so it make > > the scheduling cycle shorter/faster. It also limits the priority > > accumulated by queued jobs and alleviates 'queue stuffing'. I'd > > recommend idle limits given that maui does not contain a better > > facility to handle such issues. > > Yep, a short queue reduces maui stress. I completely agree that. > Seting a limit of 100 jobs per user leaves a 1k idle queue in normal > behaviour, when many user are running jobs. That's the limit I'd use. > > > But, as I've never tried this before, let me ask how maui will behave > in this situation: > > > if the farm is 70%, and I have only two users who have submited jobs > (user A and B). User A has much more priority than user B, so let's say > that the 30% must be filled with 25% of jobs from user A and 5% jobs > from user B, if I have 1000 jobs in queue (500 from A and 500 from B) > IDLE queue will contain 100 jobs of each user, so each scheduling > cycle is going to schedule 200 jobs, is maui going to fill up the farm > respecting our policies (25/5)? or is it going to start 100 jobs from > each user on each scheduling cycle filling up the farm 15% and 15%? > I've worked in an example: QoS Target: QOS ------------- lhcatlas* -- 47.42 (user atprd) magic* -- 3.80 (user maprd) I set a limit of 5 jobs per user. I use only one node with 10/20/40 "cpus". 80 jobs each user. np=10) # qstat |grep R|grep -c map 5 # qstat |grep R|grep -c at 5 QOS ------------- lhcatlas 50.00 47.42 50.00 magic* 50.00 3.80 50.00 np=20) # qmgr -c "s n pbs01-test.pic.es np=20" # qstat |grep R|grep -c map 10 # qstat |grep R|grep -c at 10 QOS ------------- lhcatlas 50.00 47.42 50.00 magic* 50.00 3.80 50.00 np=40) # qmgr -c "s n pbs01-test.pic.es np=40" # qstat |grep R|grep -c ma 20 # qstat |grep R|grep -c at 20 QOS ------------- lhcatlas 50.00 47.42 50.00 50.00 50.00 magic* 50.00 3.80 50.00 50.00 50.00 Without the limit FS is respected. Maybe my conf is wrong, but this behaviour is why I'm afraid of limitng idle queue. Cheers, Arnau -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.supercluster.org/pipermail/mauiusers/attachments/20110930/0f2b485b/attachment.html From jasonw at Jhu.edu Fri Sep 30 11:39:28 2011 From: jasonw at Jhu.edu (Jason Williams) Date: Fri, 30 Sep 2011 13:39:28 -0400 Subject: [Mauiusers] Maui, FairShare, and scheduling GPUs In-Reply-To: References: <4E81F238.8080403@Jhu.edu> Message-ID: <4E85FED0.7070406@Jhu.edu> I didn't get anyone else replying to this, so I'm curious if anyone else figured out a different way to make this work. Either way, I've been looking around the code today and playing with it, and I think I have the easy part done. I have maui seeing the "gpus=" attribute of the nodes and tracking it within the configured resources and available resources (I think) within the MPBSNodeLoad() and MPBSNodeUpdate() functions. I can create a new branch in the svn called "3.3.2_gpu" if anyone is interested in taking a look and/or helping to implement the tracking code to update the available resources when someone requests gpus to torque. It seems to me the trickier part will be how do we want to track that usage within FairShare.... I haven't started to think about that yet, but suggestions are welcome. -- Jason Williams Sr. Systems Administrator Homewood HPC Cluster Johns Hopkins University On 9/27/2011 8:08 PM, suraj prabhakaran wrote: > Hello Jason and Denis, > > I too am starting to look into this and am at the beginner's stage trying to understand how things work. I would be interested in working together on this. If no one else replies to the main thread in 1-2 days, we can discuss things and share knowledge. Please let me know if you are interested. > > Best regards, > Suraj Prabhakaran > > > On 09/27/11, Jason Williams wrote: > >> I'm curious if anyone has taken a look at getting Torque 2.5.x and Maui >> working together to schedule GPUS and track the usage via FairShare. I >> am pondering what would be needed to actually make that happen within >> the Maui source, but if someone else has already started working on >> this, it would be interesting to get their take on the situation. I've >> noticed, via some googling and reading on the list here, that it seems >> difficult to do without some mods to the source. If you've thought >> about it or have started on it, please email me back. >> >> -- >> Jason Williams >> Sr. Systems Administrator >> Homewood HPC Cluster >> Johns Hopkins University >> >> >> >> -------------------------- >> Suraj Prabhakaran >> >> German Research School for >> Simulation Sciences GmbH >> Laboratory for Parallel Progreamming >> 52062 Aachen | Germany >>