[torqueusers] Problem with PBS_NODEFILE
bino at coc.ufrj.br
Thu Mar 15 13:37:40 MDT 2007
I am using the pbs_sched. In my configuration I can´t get more than one
node per job.
But, if I submit another job the node-1-02 will be used.
The output of command "qstat -f".
+++ qstat -f +++
Job Id: 29.adm
Job_Name = ptest
Job_Owner = bino at adm
job_state = R
queue = b_8cpus
server = adm
Checkpoint = u
ctime = Wed Mar 14 14:22:18 2007
exec_host = node-1-01/1+node-1-01/0
Hold_Types = n
Join_Path = oe
Keep_Files = n
Mail_Points = a
mtime = Wed Mar 14 14:22:19 2007
Priority = 0
qtime = Wed Mar 14 14:22:18 2007
Rerunable = True
Resource_List.ncpus = 8
Resource_List.nodect = 4
Resource_List.nodes = 4:ppn=2
Resource_List.walltime = 00:05:00
comment = Job started on Wed Mar 14 at 14:22
etime = Wed Mar 14 14:22:18 2007
Quoting Garrick Staples <garrick at clusterresources.com>:
> On Thu, Mar 15, 2007 at 04:16:15PM -0300, Albino Aveleda alleged:
>> I submited this mpi job bellow in a cluster with 32 nodes, where each node
>> has two cpus. In my job I asked to torque 8 cpus but the PBS_NODEFILE has
>> only two cpus from first node. My torque configuration is bellow.
>> What do I do wrong? I can?t find where it is wrong.
> Can we see a qstat -f on the job? I suspect all of the nodect and ncpus
> stuff in your queue configs is confusing things.
> With the current config, your jobs are going to end up with requests for
> ncpus *and* nodes, which is not likely to work.
> And which scheduler are you using?
More information about the torqueusers