[Mauiusers] Resources problem : cannot select job 62 for partition DEFAULT (job hold active)

rishi pathak mailmaverick666 at gmail.com
Tue May 15 11:07:56 MDT 2007


Hi Daneil
I am in he process of identifying the problem
It's peculiar
One advice I can give you is to remove the mem parameter from the submit
script.
The fact is that while scheduling the job maui only checks for resource
requirement in the initial stage.After the job is scheduled it can use
whatever resource  it can in terms of memory and disk space.Your job will
surely get executed.Although other scheduled jobs will have less resources
available to them
If I get time I will surely try it over my cluster and will convey the
result to you as soon as possible.
Bye
One more thing:While replying do include maui user's list so that other's
can also ponder over the prolem.That's why the user's list is made :)
Bye again.Will get in touch with you very soon with a positive reply



On 5/15/07, Daniel Boone <daniel.boone at kahosl.be> wrote:Hi
>
> I tried some new parameters.
>
> print server output of qmgr
> ----------------
> create queue batch
> set queue batch queue_type = Execution
> set queue batch resources_default.mem = 2000mb
> set queue batch resources_default.nodes = 1
> set queue batch resources_default.pvmem = 16000mb
> set queue batch resources_default.walltime = 06:00:00
> set queue batch enabled = True
> set queue batch started = True
> #
> # Set server attributes.
> #
> set server scheduling = True
> set server managers = abaqus at em-research00
> set server operators = abaqus at em-research00
> set server default_queue = batch
> set server log_events = 511
> set server mail_from = adm
> set server scheduler_iteration = 600
> set server node_check_rate = 150
> set server tcp_timeout = 6
> set server pbs_version = 2.1.8
> ----------------------
> checkjob output:
> ----------------------
> checking job 90 (RM job '90.em-research00')
>
> State: Idle  EState: Deferred
> Creds:  user:abaqus  group:users  class:batch  qos:DEFAULT
> WallTime: 00:00:00 of 5:00:00
> SubmitTime: Tue May 15 11:59:03
> (Time Queued  Total: 1:58:17  Eligible: 00:00:00)
>
> Total Tasks: 4
>
> Req[0]  TaskCount: 4  Partition: ALL
> Network: [NONE]  Memory >= 0  Disk >= 0  Swap >= 15G
> Opsys: [NONE]  Arch: [NONE]  Features: [NONE]
> Exec:  ''  ExecSize: 0  ImageSize: 0
> Dedicated Resources Per Task: PROCS: 1  MEM: 250M  SWAP: 15G
> NodeAccess: SHARED
> TasksPerNode: 2  NodeCount: 2
>
>
> IWD: [NONE]  Executable:  [NONE]
> Bypass: 0  StartCount: 0
> PartitionMask: [ALL]
> SystemQueueTime: Tue May 15 13:00:06
>
> Flags:       RESTARTABLE
>
> job is deferred.  Reason:  NoResources  (cannot create reservation for
> job '90' (intital reservation attempt)
> )
> Holds:    Defer  (hold reason:  NoResources)
> PE:  6.07  StartPriority:  57
> cannot select job 90 for partition DEFAULT (job hold active)
> -------------------
> pbs-script:
> -------------------
>
> #!/bin/bash
> #PBS -l nodes=2:ppn=2
> #PBS -l walltime=05:00:00
> #PBS -l mem=1000mb
> #PBS -l vmem=7000mb
> #PBS -j oe
> #PBS -M daniel.boone at kahosl.be
> #PBS -m bae
> # Go to the directory from which you submitted the job
> mkdir $PBS_O_WORKDIR
> string="$PBS_O_WORKDIR/plus2gb.inp"
>
> scp 10.1.0.52:$string $PBS_O_WORKDIR
>
> cd $PBS_O_WORKDIR
> #module load abaqus
> #
> /Apps/abaqus/Commands/abaqus job=plus2gb queue=abaqus4cpu
> input=Standard_plus2gbyte.inp cpus=4
> ---------------------------
> abaqus environment file.
> --------------------------
> import os
> os.environ['LAMRSH'] = 'ssh'
>
> max_cpus=6
>
> mp_host_list=[['em-research00',3],['10.1.0.97',2]]
>
>
> run_mode = BATCH
> scratch  = "/home/abaqus"
>
> queue_name=["cpu","abaqus4cpu"]
> queue_cmd="qsub -r n -q batch -S /bin/bash -V -l nodes=1:ppn=1 %S"
> cpu="qsub -r n -q batch -S /bin/bash -V -l nodes=1:ppn=2 %S"
> abaqus4cpu="qsub -r n -q batch -S /bin/bash -V -l nodes=2:ppn=2 %S"
>
> pre_memory = "3000 mb"
> standard_memory = "7000 mb"
>
> ---------------------------
> but still no changes
>
> thanks for al the help until now.
> rishi pathak schreef:
> > Also try in your job script file
> > PBS -l pvmem=<amount of virtual memory>
> >
> > On 5/15/07, *rishi pathak* <mailmaverick666 at gmail.com
> > <mailto:mailmaverick666 at gmail.com>> wrote:
> >
> >     I did not see any specific queue in th submit script
> >     have you specified the following for the queue you are using
> >
> >     resources_default.mem #available ram
> >     resources_default.pvmem #virtual memory
> >
> >
> >
> >
> >
> >     On 5/15/07, *Daniel Boone* <daniel.boone at kahosl.be
> >     <mailto:daniel.boone at kahosl.be>> wrote:
> >
> >         Hi
> >
> >         I need to use the swap. I know I don't have enough RAM, but
> >         the job must
> >         be able to run. Even if it swaps a lot.
> >         Time is not an issue here.
> >         On 1 machine the job uses about 7.4GB swap. We don't have any
> >         other
> >         machines with more RAM to run it on.
> >         Otherwise the other option is to run the job outside
> >         torque/maui, but I
> >         rather don't do that.
> >
> >         Can some tell me how to read the checkjob -v output, because I
> >         don't
> >         understand how to find errors in it.
> >
> >         rishi pathak schreef:
> >         > Hi
> >         > system memory(RAM) available to per process is less than the
> >         requested
> >         > amount
> >         > It is not considering swap as an extention of RAM
> >         > Try with reduced system memory
> >         >
> >         >
> >         >
> >         > On 5/14/07, *Daniel Boone* <daniel.boone at kahosl.be
> >         <mailto:daniel.boone at kahosl.be>
> >         > <mailto: daniel.boone at kahosl.be
> >         <mailto:daniel.boone at kahosl.be>>> wrote:
> >         >
> >         >     Hi
> >         >
> >         >     I'm having the following problem. When I submit a very
> >         >     memory-intensive(most swap) job, the job doesn't want to
> >         start.
> >         >     It gives the error: cannot select job 62 for partition
> >         DEFAULT
> >         >     (job hold
> >         >     active)
> >         >     But I don't understand what the error means.
> >         >
> >         >     I run torque 2.1.8 with maui maui-3.2.6p19
> >         >
> >         >     checkjob -v returns the following:
> >         >     -------------------
> >         >     checking job 62 (RM job '62.em-research00')
> >         >
> >         >     State: Idle  EState: Deferred
> >         >     Creds:  user:abaqus  group:users  class:batch  qos:DEFAULT
> >         >     WallTime: 00:00:00 of 6:00:00
> >         >     SubmitTime: Mon May 14 14:13:41
> >         >     (Time Queued  Total: 1:53:39  Eligible: 00:00:00)
> >         >
> >         >     Total Tasks: 4
> >         >
> >         >     Req[0]  TaskCount: 4  Partition: ALL
> >         >     Network: [NONE]  Memory >= 0  Disk >= 0  Swap >= 0
> >         >     Opsys: [NONE]  Arch: [NONE]  Features: [NONE]
> >         >     Exec:  ''  ExecSize: 0  ImageSize: 0
> >         >     Dedicated Resources Per Task: PROCS: 1  MEM: 3875M
> >         >     NodeAccess: SHARED
> >         >     TasksPerNode: 2  NodeCount: 2
> >         >
> >         >
> >         >     IWD: [NONE]  Executable:  [NONE]
> >         >     Bypass: 0  StartCount: 0
> >         >     PartitionMask: [ALL]
> >         >     SystemQueueTime: Mon May 14 15:14:13
> >         >
> >         >     Flags:       RESTARTABLE
> >         >
> >         >     job is deferred.  Reason:  NoResources  (cannot create
> >         reservation for
> >         >     job '62' (intital reservation attempt)
> >         >     )
> >         >     Holds:    Defer  (hold reason:  NoResources)
> >         >     PE:  19.27  StartPriority:  53
> >         >     cannot select job 62 for partition DEFAULT (job hold
> active)
> >         >     ------------------------
> >         >     checknode of the two nodes:checking node em-research00
> >         >     ------------
> >         >     State:      Idle  (in current state for 2:31:21)
> >         >     Configured Resources: PROCS: 3  MEM: 2010M  SWAP:
> >         33G  DISK: 72G
> >         >
> >         >
> >         >     Utilized   Resources: DISK: 9907M
> >         >     Dedicated  Resources: [NONE]
> >         >     Opsys:         linux  Arch:      [NONE]
> >         >     Speed:      1.00  Load:       0.000
> >         >     Network:    [DEFAULT]
> >         >     Features:   [F]
> >         >     Attributes: [Batch]
> >         >     Classes:    [batch 3:3]
> >         >
> >         >     Total Time: 2:29:18  Up: 2:29:18 (100.00%)  Active:
> >         00:00:00 (0.00% )
> >         >
> >         >     Reservations:
> >         >     NOTE:  no reservations on node
> >         >
> >         >     --------------------
> >         >     State:      Idle  (in current state for 2:31:52)
> >         >     Configured Resources: PROCS: 2  MEM: 2012M  SWAP:
> >         17G  DISK: 35G
> >         >     Utilized   Resources: DISK: 24G
> >         >     Dedicated  Resources: [NONE]
> >         >     Opsys:         linux  Arch:      [NONE]
> >         >     Speed:      1.00  Load:       0.590
> >         >     Network:    [DEFAULT]
> >         >     Features:   [NONE]
> >         >     Attributes: [Batch]
> >         >     Classes:    [batch 2:2]
> >         >
> >         >     Total Time: 2:29:49  Up: 2:29:49 ( 100.00%)  Active:
> >         00:00:00 ( 0.00%)
> >         >
> >         >     Reservations:
> >         >     NOTE:  no reservations on node
> >         >     -----------------
> >         >     The pbs scipt I'm using:
> >         >     #!/bin/bash
> >         >     #PBS -l nodes=2:ppn=2
> >         >     #PBS -l walltime=06:00:00
> >         >     #PBS -l mem=15500mb
> >         >     #PBS -j oe
> >         >     # Go to the directory from which you submitted the job
> >         >     mkdir $PBS_O_WORKDIR
> >         >     string="$PBS_O_WORKDIR/plus2gb.inp"
> >         >     scp 10.1.0.52:$string $PBS_O_WORKDIR
> >         >     #scp 10.1.0.52:$PBS_O_WORKDIR'/'$PBS_JOBNAME ./
> >         >     cd $PBS_O_WORKDIR
> >         >     #module load abaqus
> >         >     #
> >         >     /Apps/abaqus/Commands/abaqus job=plus2gb queue=cpu2
> >         >     input=Standard_plus2gbyte.inp cpus=4 mem=15000mb
> >         >     ---------------------------
> >         >     If you need some extra info please let me know.
> >         >
> >         >     Thank you
> >         >
> >         >     _______________________________________________
> >         >     mauiusers mailing list
> >         >     mauiusers at supercluster.org
> >         <mailto:mauiusers at supercluster.org> <mailto:
> >         mauiusers at supercluster.org <mailto:mauiusers at supercluster.org>>
> >         >     http://www.supercluster.org/mailman/listinfo/mauiusers
> >         >
> >         >
> >         >
> >         >
> >         > --
> >         > Regards--
> >         > Rishi Pathak
> >         > National PARAM Supercomputing Facility
> >         > Center for Development of Advanced Computing(C-DAC)
> >         > Pune University Campus,Ganesh Khind Road
> >         > Pune-Maharastra
> >
> >
> >
> >
> >     --
> >     Regards--
> >     Rishi Pathak
> >     National PARAM Supercomputing Facility
> >     Center for Development of Advanced Computing(C-DAC)
> >     Pune University Campus,Ganesh Khind Road
> >     Pune-Maharastra
> >
> >
> >
> >
> > --
> > Regards--
> > Rishi Pathak
> > National PARAM Supercomputing Facility
> > Center for Development of Advanced Computing(C-DAC)
> > Pune University Campus,Ganesh Khind Road
> > Pune-Maharastra
>



-- 
Regards--
Rishi Pathak
National PARAM Supercomputing Facility
Center for Development of Advanced Computing(C-DAC)
Pune University Campus,Ganesh Khind Road
Pune-Maharastra
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/mauiusers/attachments/20070515/7188ebc1/attachment-0001.html


More information about the mauiusers mailing list