must remove nodes=1 - WAS: [Mauiusers] Node idle but load is HIGH

Toni L. Harbaugh-Blackford [Contr] harbaugh at ncifcrf.gov
Fri Sep 28 08:28:16 MDT 2007


The problem is "nodes=1".  With "nodes=1", all cpus from the "ncpus=100"
setting MUST be on the same node.  Do you have nodes=1 in your qmgr setup?

On Fri, 28 Sep 2007, Jan Ploski wrote:

  > mauiusers-bounces at supercluster.org schrieb am 09/28/2007 03:09:22 PM:
  > 
  > > On Fri, 28 Sep 2007, Jan Ploski wrote:
  > > 
  > > > ...and according to pstree these jobs are child processes of
  > > > pbs_mom, so definitely not "runaway".
  > > 
  > > What does qstat -f say about those jobs ?
  > 
  > Here is an example. I see nothing strange in it:
  > 
  > Job Id: 346597.srvgrid01.offis.uni-oldenburg.de
  >     Job_Name = STDIN
  >     Job_Owner = dgad0006 at srvgrid01.offis.uni-oldenburg.de
  >     resources_used.cput = 04:25:03
  >     resources_used.mem = 82576kb
  >     resources_used.vmem = 164320kb
  >     resources_used.walltime = 04:25:47
  >     job_state = R
  >     queue = dgiseq
  >     server = srvgrid01.offis.uni-oldenburg.de
  >     Checkpoint = u
  >     ctime = Fri Sep 28 11:00:53 2007
  >     Error_Path = srvgrid01:/home/d-grid-users/dgad0006/1705.err
  >     exec_host = node43/0
  >     Hold_Types = n
  >     Join_Path = n
  >     Keep_Files = n
  >     Mail_Points = n
  >     mtime = Fri Sep 28 11:00:54 2007
  >     Output_Path = srvgrid01:/home/d-grid-users/dgad0006/1705.out
  >     Priority = 0
  >     qtime = Fri Sep 28 11:00:53 2007
  >     Rerunable = True
  >     Resource_List.ncpus = 0
  >     Resource_List.neednodes = 1
  >     Resource_List.nodect = 1
  >     Resource_List.nodes = 1
  >     Resource_List.walltime = 12:00:00
  >     session_id = 8431
  >     Shell_Path_List = /bin/sh
  >     substate = 42
  >     Variable_List = PBS_O_HOME=/home/d-grid-users/dgad0006,
  >         PBS_O_LOGNAME=dgad0006,
  >         PBS_O_PATH=/usr/sbin:/bin:/usr/bin:/sbin:/usr/X11R6/bin,
  >         PBS_O_SHELL=/bin/bash,PBS_O_HOST=srvgrid01.offis.uni-oldenburg.de,
  >         PBS_O_WORKDIR=/home/d-grid-users/dgad0006,PBS_O_QUEUE=dgiseq
  >     euser = dgad0006
  >     egroup = ad
  >     hashname = 346597.srvg
  >     queue_rank = 103775
  >     queue_type = E
  >     etime = Fri Sep 28 11:00:53 2007
  > 
  > 
  > Best regards,
  > Jan Ploski
  > _______________________________________________
  > mauiusers mailing list
  > mauiusers at supercluster.org
  > http://www.supercluster.org/mailman/listinfo/mauiusers
  > 

-------------------------------------------------------------------
Toni Harbaugh-Blackford                       harbaugh at ncifcrf.gov
System Administrator
Advanced Biomedical Computing Center (ABCC)
National Cancer Institute
Contractor - SAIC/Frederick


More information about the mauiusers mailing list