[torqueusers] Adding to a cluster
wayne.mallett at jcu.edu.au
Wed Sep 24 15:25:53 MDT 2008
All nodes were visible. I checked torque and maui configurations and
then restarted torque and maui again. What I noticed in and around all
the checks I made was that qmgr wasn't reporting the correct amount of
resources used - using qstat to compare with report using qmgr. After
the restart of torque and maui I discovered another ~350 new jobs (on
top of the ~150 running) showed up on the system - we only have ~400 CPUs.
My guess is that the system was having trouble catching up with an
influx of jobs - coming from a user's script. I've asked the user to
put a sleep command between job submissions in the script. I restarted
torque several times during my investigations. It wasn't until maui was
restarted that the problem showed itself.
Thankyou all for the suggestions.
Brock Palen wrote:
> Did you add the nodes to torque?
> pbsnodes hostname
> Does it show up?
> Brock Palen
> Center for Advanced Computing
> brockp at umich.edu
> On Sep 24, 2008, at 11:24 AM, Craig West wrote:
>> G'day Wayne,
>> Did you restart Maui as well?
>> It just sounds like Maui doesn't know about the new nodes.
>> On 09/23/2008 08:52 PM, Wayne Mallett wrote:
>>> G'day Torque users,
>>> I've recently added new servers to the cluster that I manage. I've
>>> changed the resources_available.ncpus to the new figure and restarted
>>> the pbs_server daemon, but it seems Toque/Maui cannot run jobs - maui
>>> reports 'NoResources' available. I can manually force the jobs to
>>> run. Does anyone have any ideas?
>> torqueusers mailing list
>> torqueusers at supercluster.org
Dr. Wayne Mallett
Email: Wayne.Mallet at jcu.edu.au
Smail: High Performance & Research Computing
James Cook University
Townsville Qld 4811
More information about the torqueusers