[torqueusers] Cannot get more than 1 core on a node

Richard Young Richard.Young at usq.edu.au
Mon Aug 15 17:13:54 MDT 2011


Gus
pbsnodes reports all the correct cores, it didn't at first but restarting the server fixed that problem. The mom_priv/config shows no difference between the nodes on the working queue and the nodes on the non-working queue. I have included it below

$logevent 127
$clienthost pbs_oscar
$usecp pbs_oscar:/home /home
$restricted pbs_oscar

I have also deleted the queue that is not working, restarted the server, rebuilt the queue and then restarted the server but this didn't fix the problem. I also intend to rebuild all the nodes that are part of the queue that is not working, this might help.

Thanks 
---------------------------------------------------------------------
Richard A. Young
Division of ICT Services
Email: Richard.Young at usq.edu.au   Phone: (07) 46315557   
Mob:   0437544370          Fax:   (07) 46312798 
---------------------------------------------------------------------


-----Original Message-----
From: torqueusers-bounces at supercluster.org [mailto:torqueusers-bounces at supercluster.org] On Behalf Of Gus Correa
Sent: Tuesday, 16 August 2011 12:55 AM
To: Torque Users Mailing List
Subject: Re: [torqueusers] Cannot get more than 1 core on a node

Hi Richard

No weird maui.cfg, no funny xen stuff ... I'm running out of guesses ...

Does 'pbsnodes' report the correct number of cores on the nodes?

Did you restart the pbs_server after you edited the
server_priv/nodes file to the current configuration?

Any restrictions on the compute node's mom_priv/config?

IHIH
Gus Correa


Richard Young wrote:
> Gus
> All the nodes a physical servers with dual quad core CPUs. 
> /proc/cpuinfo lists or shows all 8 cores and top also shows all the cores.
> 
> ---------------------------------------------------------------------
> Richard A. Young
> Division of ICT Services
> Email: Richard.Young at usq.edu.au   Phone: (07) 46315557   
> Mob:   0437544370          Fax:   (07) 46312798 
> ---------------------------------------------------------------------
> 
> 
> -----Original Message-----
> From: torqueusers-bounces at supercluster.org [mailto:torqueusers-bounces at supercluster.org] On Behalf Of Gustavo Correa
> Sent: Monday, 15 August 2011 10:44 AM
> To: Torque Users Mailing List
> Subject: Re: [torqueusers] Cannot get more than 1 core on a node
> 
> Hi Richard
> 
> A wild guess / long shot.
> 
> Any chance that these nodes are virtualized (say, via xen),
> and perhaps have a single "virtual" core recognized by Linux?
> In the Rocks cluster mailing list this situation was reported occasionally,
> specifically by people that had installed the Rocks "xen roll".
> What does 'cat /proc/cpuinfo' on your compute nodes tell?
> 
> IHIH
> Gus Correa
> 
> On Aug 14, 2011, at 7:24 PM, Richard Young wrote:
> 
>> Chris
>> I started another job and the output from checkjob -v is
>> [youngr at hpc00 torque.jobs]$ checkjob -v 3533
>>
>>
>> checking job 3533 (RM job '3533.hpc00.usq.edu.au')
>>
>> State: Idle  EState: Deferred
>> Creds:  user:youngr  group:ict  class:long  qos:DEFAULT
>> WallTime: 00:00:00 of 00:05:00
>> SubmitTime: Mon Aug 15 09:20:41
>>  (Time Queued  Total: 00:00:12  Eligible: 00:00:01)
>>
>> Total Tasks: 2
>>
>> Req[0]  TaskCount: 2  Partition: ALL
>> Network: [NONE]  Memory >= 0  Disk >= 0  Swap >= 0
>> Opsys: [NONE]  Arch: [NONE]  Features: [long]
>> Exec:  ''  ExecSize: 0  ImageSize: 0
>> Dedicated Resources Per Task: PROCS: 1
>> NodeAccess: SHARED
>> TasksPerNode: 2  NodeCount: 1
>>
>>
>> IWD: [NONE]  Executable:  [NONE]
>> Bypass: 0  StartCount: 0
>> PartitionMask: [ALL]
>> Flags:       RESTARTABLE
>>
>> job is deferred.  Reason:  NoResources  (cannot create reservation for job '3533' (intital reservation attempt)
>> )
>> Holds:    Defer  (hold reason:  NoResources)
>> PE:  2.00  StartPriority:  1
>> cannot select job 3533 for partition DEFAULT (job hold active)
>>
>> thanks 
>> ---------------------------------------------------------------------
>> Richard A. Young
>> Division of ICT Services
>> Email: Richard.Young at usq.edu.au   Phone: (07) 46315557   
>> Mob:   0437544370          Fax:   (07) 46312798 
>> ---------------------------------------------------------------------
>>
>>
>> -----Original Message-----
>> From: torqueusers-bounces at supercluster.org [mailto:torqueusers-bounces at supercluster.org] On Behalf Of Christopher Samuel
>> Sent: Friday, 12 August 2011 1:29 PM
>> To: torqueusers at supercluster.org
>> Subject: Re: [torqueusers] Cannot get more than 1 core on a node
>>
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> On 11/08/11 17:10, Richard Young wrote:
>>
>>> job is deferred.  Reason:  NoResources  (cannot create reservation for job '3466' (intital reservation attempt))
>> Any chance you could do a checkjob -v on that ?
>>
>> Not sure with Maui, but with Moab it'll spit out each of
>> the hosts and why that particular one isn't eligible..
>>
>> cheers!
>> Chris
>> - -- 
>>    Christopher Samuel - Senior Systems Administrator
>> VLSCI - Victorian Life Sciences Computation Initiative
>> Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545
>>         http://www.vlsci.unimelb.edu.au/
>>
>> -----BEGIN PGP SIGNATURE-----
>> Version: GnuPG v1.4.11 (GNU/Linux)
>> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
>>
>> iEYEARECAAYFAk5EnfgACgkQO2KABBYQAh8+qQCfex0F0+BhDYC6Hrzx0n7XcMO1
>> VsMAnjX4NEEML+o9lQPhqNzJVagmUEoM
>> =OoI5
>> -----END PGP SIGNATURE-----
>> _______________________________________________
>> torqueusers mailing list
>> torqueusers at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/torqueusers
>>
>> This email (including any attached files) is confidential and is for the
>> intended recipient(s) only.  If you received this email by mistake,
>> please, as a courtesy, tell the sender, then delete this email.
>>
>> The views and opinions are the originator's and do not necessarily
>> reflect those of the University of Southern Queensland.  Although all
>> reasonable precautions were taken to ensure that this email contained no
>> viruses at the time it was sent we accept no liability for any losses
>> arising from its receipt.
>>
>> The University of Southern Queensland is a registered provider of
>> education with the Australian Government (CRICOS Institution Code No's.
>> QLD 00244B / NSW 02225M)
>>
>>
>> _______________________________________________
>> torqueusers mailing list
>> torqueusers at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/torqueusers
> 
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
> 
> This email (including any attached files) is confidential and is for the
> intended recipient(s) only.  If you received this email by mistake,
> please, as a courtesy, tell the sender, then delete this email.
> 
> The views and opinions are the originator's and do not necessarily
> reflect those of the University of Southern Queensland.  Although all
> reasonable precautions were taken to ensure that this email contained no
> viruses at the time it was sent we accept no liability for any losses
> arising from its receipt.
> 
> The University of Southern Queensland is a registered provider of
> education with the Australian Government (CRICOS Institution Code No's.
> QLD 00244B / NSW 02225M)
> 
> 
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers

_______________________________________________
torqueusers mailing list
torqueusers at supercluster.org
http://www.supercluster.org/mailman/listinfo/torqueusers

This email (including any attached files) is confidential and is for the
intended recipient(s) only.  If you received this email by mistake,
please, as a courtesy, tell the sender, then delete this email.

The views and opinions are the originator's and do not necessarily
reflect those of the University of Southern Queensland.  Although all
reasonable precautions were taken to ensure that this email contained no
viruses at the time it was sent we accept no liability for any losses
arising from its receipt.

The University of Southern Queensland is a registered provider of
education with the Australian Government (CRICOS Institution Code No's.
QLD 00244B / NSW 02225M)




More information about the torqueusers mailing list