[torqueusers] 3 jobs falsely scheduled to one host with 2 processors

Steffen Möller steffen_moeller at gmx.de
Mon Jul 5 02:13:55 MDT 2010


On 07/05/2010 08:45 AM, Grid-Admins wrote:
> Am 30.06.10 19:46, schrieb Garrick Staples:
>   
>> On Mon, Jun 28, 2010 at 11:31:04AM +0200, Grid-Admins alleged:
>>     
>>> Hello all,
>>>
>>> we just set up a torque-system and are experiencing a weird behaviour.
>>> Although all of our nodes have 2 processors (np=2 in
>>> /var/spool/pbs/server_priv/nodes) the very first one (and only this
>>> server) is always getting 3 jobs.
>>> Does anyone know why this could be?
>>>       
>> I've seen this in 2 cases: suspended jobs (this is normal), and broken torque
>> in early 2.3.x releases.
>>     
> Sadly none of this is the case. We just switched to the newborn debian 
> packages (2.4.8) and no job was suspended.
>
> Do you have any other ideas?
>   
Sorry for asking, but have you excluded the "typo in nodes file" kind of
problem?
What does pbsnodes say about that node and another one? What happens if
you send that first one offline, will then get another node the 3 jobs?
What happens if you run that first machine under another name as a
client than it runs as a server, i.e. let your current node name is
"torqueserver" and you have a second name for the machine (say node00)
which is used for being a friendly client?

No other ideas for the moment.

Steffen


More information about the torqueusers mailing list