[torqueusers] Problem with PBS_NODEFILE

Albino Aveleda bino at coc.ufrj.br
Thu Mar 15 13:37:40 MDT 2007


Hi Garrick,

I am using the pbs_sched. In my configuration I can´t get more than one 
node per job.
But, if I submit another job the node-1-02 will be used.

The output of command "qstat -f".

+++ qstat -f +++
Job Id: 29.adm
    Job_Name = ptest
    Job_Owner = bino at adm
    job_state = R
    queue = b_8cpus
    server = adm
    Checkpoint = u
    ctime = Wed Mar 14 14:22:18 2007
    exec_host = node-1-01/1+node-1-01/0
    Hold_Types = n
    Join_Path = oe
    Keep_Files = n
    Mail_Points = a
    mtime = Wed Mar 14 14:22:19 2007
    Priority = 0
    qtime = Wed Mar 14 14:22:18 2007
    Rerunable = True
    Resource_List.ncpus = 8
    Resource_List.nodect = 4
    Resource_List.nodes = 4:ppn=2
    Resource_List.walltime = 00:05:00
    comment = Job started on Wed Mar 14 at 14:22
    etime = Wed Mar 14 14:22:18 2007
+++

Best regards,
Bibo

Quoting Garrick Staples <garrick at clusterresources.com>:

> On Thu, Mar 15, 2007 at 04:16:15PM -0300, Albino Aveleda alleged:
>> Hi,
>>
>> I submited this mpi job bellow in a cluster with 32 nodes, where each node
>> has two cpus. In my job I asked to torque 8 cpus but the PBS_NODEFILE has
>> only two cpus from first node. My torque configuration is bellow.
>>
>> What do I do wrong? I can?t find where it is wrong.
>
> Can we see a qstat -f on the job?  I suspect all of the nodect and ncpus
> stuff in your queue configs is confusing things.
>
> With the current config, your jobs are going to end up with requests for
> ncpus *and* nodes, which is not likely to work.
>
> And which scheduler are you using?
>




More information about the torqueusers mailing list