[torqueusers] cluster underused - single cpu jobs hold up parallel

Lydia Heck lydia.heck at durham.ac.uk
Thu Mar 31 15:50:39 MDT 2011


Thie problem is partly the problem with the scheduller but before all it is a 
problem of the pbs_mom of two nodes in the cluster not communicating properly
with the pbs_server. The pbs_mom runs, the nodes report as working, some jobs 
are running, but the systems fail to start new jobs.

However at present the cluster is now working except the two nodes.

Lydia

On Thu, 31 Mar 2011, "Mgr. Šimon Tóth" wrote:

>> I know that I asked that question yesterday and I repeat it again:
>>
>> The cluster has ~2,600 cores, there are parallel jobs running to fill ~1,700
>> and there are many sequential jobs queue that are now in "front" of other
>> parallel jobs. But only one or two of the sequential jobs are running.
>>
>> The parallel jobs are not schedulled. The scheduller is maui. Any idea what I am
>> missing here?
>
> Please ask on Maui mailing list. Torque semantics are almost completely
> overridden when connected to a scheduler like Maui.
>
> -- 
> Mgr. Simon Toth
>


More information about the torqueusers mailing list