[torqueusers] -lnodes=X ignored??

Peter Enstrom enstrom at ncsa.uiuc.edu
Fri Nov 16 15:07:08 MST 2007


After unsetting resources_available.nodect in qmgr it was necessary 
to restart the pbs_server process in order for the problem to go away.

Peter

At 03:56 PM 11/16/2007, Jeremy Enos wrote:
>With the help of Peter Enstrom and his test cluster, we were able to 
>track down the root of the problem.  It's the following instruction 
>in the included torque.setup script:
>
>set server resources_available.nodect = 99999
>
>It doesn't matter what the value is set to; having it set at all 
>causes the issue.  Moab will mask this issue since it hands down a 
>specific node list, so that may explain why others haven't seen it yet.
>We fixed the problem simply by doing:
>qmgr -c "unset server resources_available.nodect"
>
>The torque.setup script in 2.1.9 didn't set the bad parameter, but 
>it was retained in the PBS database when I downgraded w/o blowing 
>away the database.  The torque.setup script in 2.2.1 certainly 
>should be fixed though.
>thx-
>
>    Jeremy
>
>Jeremy Enos wrote:
>>I should note however, that the ppn specification seems to be 
>>respected at least w/ 2.1.9, where it wasn't w/ 2.2.1.  Not sure if 
>>that's related or not.
>>
>>    Jeremy
>>
>>Jeremy Enos wrote:
>>>Yep.. rebuilt and re-installed w/ 2.1.9.  Same problem.   Output 
>>>pasted below.  I'm totally stumped here.
>>>thx-
>>>
>>>    Jeremy
>>>
>>>[jenos at qp ~]$ echo "sleep 30" |qsub -l nodes=8
>>>76.qp
>>>[jenos at qp ~]$ qstat -n
>>>
>>>qp:
>>>
>>>Req'd  Req'd   Elap
>>>Job ID               Username Queue    Jobname    SessID NDS   TSK 
>>>Memory Time  S Time
>>>-------------------- -------- -------- ---------- ------ ----- --- 
>>>------ ----- - -----
>>>76.qp                jenos    batch    STDIN        3244     8  --
>>>--  24:00 R   --
>>>   qp01/0
>>>[jenos at qp ~]$ qmgr -c "l s"
>>>Server qp
>>>        server_state = Active
>>>        scheduling = True
>>>        total_jobs = 1
>>>        state_count = Transit:0 Queued:0 Held:0 Waiting:0 
>>> Running:0 Exiting:0
>>>        managers = root at qp.ncsa.uiuc.edu
>>>        operators = root at qp.ncsa.uiuc.edu
>>>        default_queue = batch
>>>        log_events = 511
>>>        mail_from = adm
>>>        resources_available.nodect = 999999
>>>        resources_assigned.nodect = 0
>>>        scheduler_iteration = 600
>>>        node_check_rate = 150
>>>        tcp_timeout = 6
>>>        mom_job_sync = True
>>>        pbs_version = 2.1.9
>>>        keep_completed = 300
>>>
>>>[jenos at qp ~]$
>>>
>>>
>>>Garrick Staples wrote:
>>>>>set server pbs_version = 2.2.1
>>>>>
>>>>
>>>>Can you try again with 2.1.9?
>>>>
>>>>
>>>>------------------------------------------------------------------------
>>>>
>>>>_______________________________________________
>>>>torqueusers mailing list
>>>>torqueusers at supercluster.org
>>>>http://www.supercluster.org/mailman/listinfo/torqueusers
>>>>
>>>_______________________________________________
>>>torqueusers mailing list
>>>torqueusers at supercluster.org
>>>http://www.supercluster.org/mailman/listinfo/torqueusers
>>_______________________________________________
>>torqueusers mailing list
>>torqueusers at supercluster.org
>>http://www.supercluster.org/mailman/listinfo/torqueusers
>_______________________________________________
>torqueusers mailing list
>torqueusers at supercluster.org
>http://www.supercluster.org/mailman/listinfo/torqueusers



More information about the torqueusers mailing list