[torqueusers] 3 jobs falsely scheduled to one host with 2 processors

Grid-Admins grid-admin at mpib-berlin.mpg.de
Mon Jul 12 06:04:13 MDT 2010


>>>> we just set up a torque-system and are experiencing a weird behaviour.
>>>> Although all of our nodes have 2 processors (np=2 in
>>>> /var/spool/pbs/server_priv/nodes) the very first one (and only this
>>>> server) is always getting 3 jobs.
>>>> Does anyone know why this could be?
>>>>
>>> I've seen this in 2 cases: suspended jobs (this is normal), and broken torque
>>> in early 2.3.x releases.
>>>
>> Sadly none of this is the case. We just switched to the newborn debian
>> packages (2.4.8) and no job was suspended.
>>
>> Do you have any other ideas?
>>
> Sorry for asking, but have you excluded the "typo in nodes file" kind of
> problem?
> What does pbsnodes say about that node and another one? What happens if
> you send that first one offline, will then get another node the 3 jobs?
> What happens if you run that first machine under another name as a
> client than it runs as a server, i.e. let your current node name is
> "torqueserver" and you have a second name for the machine (say node00)
> which is used for being a friendly client?
>
> No other ideas for the moment.

Thank you for your suggestions. We double checked your first hint.
The nodes-file is correct, the program pbsnodes and qmgr -c "l n node-x" 
list the same settings for all nodes.
We removed node-1 (the one that got 3 jobs instead of 2) and restarted 
the torque server.
What happens now is that the new first node in the list (node-2) gets 3 
instead of 2 jobs. I could workaround this problem by assigning node-1 
np=1 instead of np=2 so that it would get only 2 jobs at a time..

I'm afraid i couldn't follow your last approach.. Can you explain again 
where you were trying to go with this?

cheers,
   michael


More information about the torqueusers mailing list