[torqueusers] -lnodes=X ignored??
Jeremy Enos
jenos at ncsa.uiuc.edu
Fri Nov 16 14:56:45 MST 2007
With the help of Peter Enstrom and his test cluster, we were able to
track down the root of the problem. It's the following instruction in
the included torque.setup script:
set server resources_available.nodect = 99999
It doesn't matter what the value is set to; having it set at all causes
the issue. Moab will mask this issue since it hands down a specific
node list, so that may explain why others haven't seen it yet.
We fixed the problem simply by doing:
qmgr -c "unset server resources_available.nodect"
The torque.setup script in 2.1.9 didn't set the bad parameter, but it
was retained in the PBS database when I downgraded w/o blowing away the
database. The torque.setup script in 2.2.1 certainly should be fixed
though.
thx-
Jeremy
Jeremy Enos wrote:
> I should note however, that the ppn specification seems to be
> respected at least w/ 2.1.9, where it wasn't w/ 2.2.1. Not sure if
> that's related or not.
>
> Jeremy
>
> Jeremy Enos wrote:
>> Yep.. rebuilt and re-installed w/ 2.1.9. Same problem. Output
>> pasted below. I'm totally stumped here.
>> thx-
>>
>> Jeremy
>>
>> [jenos at qp ~]$ echo "sleep 30" |qsub -l nodes=8
>> 76.qp
>> [jenos at qp ~]$ qstat -n
>>
>> qp:
>>
>> Req'd Req'd Elap
>> Job ID Username Queue Jobname SessID NDS TSK
>> Memory Time S Time
>> -------------------- -------- -------- ---------- ------ ----- ---
>> ------ ----- - -----
>> 76.qp jenos batch STDIN 3244 8 --
>> -- 24:00 R --
>> qp01/0
>> [jenos at qp ~]$ qmgr -c "l s"
>> Server qp
>> server_state = Active
>> scheduling = True
>> total_jobs = 1
>> state_count = Transit:0 Queued:0 Held:0 Waiting:0 Running:0
>> Exiting:0
>> managers = root at qp.ncsa.uiuc.edu
>> operators = root at qp.ncsa.uiuc.edu
>> default_queue = batch
>> log_events = 511
>> mail_from = adm
>> resources_available.nodect = 999999
>> resources_assigned.nodect = 0
>> scheduler_iteration = 600
>> node_check_rate = 150
>> tcp_timeout = 6
>> mom_job_sync = True
>> pbs_version = 2.1.9
>> keep_completed = 300
>>
>> [jenos at qp ~]$
>>
>>
>> Garrick Staples wrote:
>>>> set server pbs_version = 2.2.1
>>>>
>>>
>>> Can you try again with 2.1.9?
>>>
>>>
>>> ------------------------------------------------------------------------
>>>
>>>
>>> _______________________________________________
>>> torqueusers mailing list
>>> torqueusers at supercluster.org
>>> http://www.supercluster.org/mailman/listinfo/torqueusers
>>>
>> _______________________________________________
>> torqueusers mailing list
>> torqueusers at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/torqueusers
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
More information about the torqueusers
mailing list