[torqueusers] -lnodes=X ignored??

Jeremy Enos jenos at ncsa.uiuc.edu
Fri Nov 16 14:56:45 MST 2007


With the help of Peter Enstrom and his test cluster, we were able to 
track down the root of the problem.  It's the following instruction in 
the included torque.setup script:

set server resources_available.nodect = 99999

It doesn't matter what the value is set to; having it set at all causes 
the issue.  Moab will mask this issue since it hands down a specific 
node list, so that may explain why others haven't seen it yet.
We fixed the problem simply by doing:
qmgr -c "unset server resources_available.nodect"

The torque.setup script in 2.1.9 didn't set the bad parameter, but it 
was retained in the PBS database when I downgraded w/o blowing away the 
database.  The torque.setup script in 2.2.1 certainly should be fixed 
though.
thx-

    Jeremy

Jeremy Enos wrote:
> I should note however, that the ppn specification seems to be 
> respected at least w/ 2.1.9, where it wasn't w/ 2.2.1.  Not sure if 
> that's related or not.
>
>    Jeremy
>
> Jeremy Enos wrote:
>> Yep.. rebuilt and re-installed w/ 2.1.9.  Same problem.   Output 
>> pasted below.  I'm totally stumped here.
>> thx-
>>
>>    Jeremy
>>
>> [jenos at qp ~]$ echo "sleep 30" |qsub -l nodes=8
>> 76.qp
>> [jenos at qp ~]$ qstat -n
>>
>> qp:
>>                                                                   
>> Req'd  Req'd   Elap
>> Job ID               Username Queue    Jobname    SessID NDS   TSK 
>> Memory Time  S Time
>> -------------------- -------- -------- ---------- ------ ----- --- 
>> ------ ----- - -----
>> 76.qp                jenos    batch    STDIN        3244     8  --    
>> --  24:00 R   --
>>   qp01/0
>> [jenos at qp ~]$ qmgr -c "l s"
>> Server qp
>>        server_state = Active
>>        scheduling = True
>>        total_jobs = 1
>>        state_count = Transit:0 Queued:0 Held:0 Waiting:0 Running:0 
>> Exiting:0
>>        managers = root at qp.ncsa.uiuc.edu
>>        operators = root at qp.ncsa.uiuc.edu
>>        default_queue = batch
>>        log_events = 511
>>        mail_from = adm
>>        resources_available.nodect = 999999
>>        resources_assigned.nodect = 0
>>        scheduler_iteration = 600
>>        node_check_rate = 150
>>        tcp_timeout = 6
>>        mom_job_sync = True
>>        pbs_version = 2.1.9
>>        keep_completed = 300
>>
>> [jenos at qp ~]$
>>
>>
>> Garrick Staples wrote:
>>>> set server pbs_version = 2.2.1
>>>>     
>>>
>>> Can you try again with 2.1.9?
>>>
>>>   
>>> ------------------------------------------------------------------------ 
>>>
>>>
>>> _______________________________________________
>>> torqueusers mailing list
>>> torqueusers at supercluster.org
>>> http://www.supercluster.org/mailman/listinfo/torqueusers
>>>   
>> _______________________________________________
>> torqueusers mailing list
>> torqueusers at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/torqueusers
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers


More information about the torqueusers mailing list