[torqueusers] torque 4.0.2

Delphine Ramalingom delphine.ramalingom at univ-reunion.fr
Fri Jun 29 05:30:35 MDT 2012


Hi,
I've solved the problem : I've installed maui.
thanks.
Delphine

Le 18/06/12 14:28, Delphine Ramalingom a écrit :
> Thanks for your suggestions.
>
> I think the problem is that I'm on a workstation, a unique server for
> three daemons pbs_server, pbs_mom and pbs_sched.
>
> Delphine
>
> Le 16/06/12 00:19, Gus Correa a écrit :
>> On 06/15/2012 03:33 PM, Andrus, Brian Contractor wrote:
>>> Delphine,
>>>
>>> Check your queues and ensure they are enabled and started. Eg:
>>> 	qmgr -c 'set queue tiny enabled = True'
>>> 	qmgr -c 'set queue tiny started = True'
>>>
>>>
>>> Also on your jobs that all have the same $PBS_TASKNUM, you need to submit them as array jobs (eg #PBS -t 10)
>>>
>>> Brian Andrus
>>> ITACS/Research Computing
>>> Naval Postgraduate School
>>> Monterey, California
>>> voice: 831-656-6238
>>>
>>>
>> ... and to enable scheduling:
>>
>> qmgr -c 'set server scheduling = True'
>>
>> ***
>>
>> Can the server name on mom_priv/config be resolved by
>> the compute nodes?
>> Typically in /etc/hosts, and associated to your cluster
>> private subnet. Say:
>>
>> mom_priv/config:
>> $pbsserver	headnode
>>
>> /etc/hosts:
>> 192.168.1.1  headnode
>>
>> ***
>> Did you run 'pbsnodes' to see which nodes/moms respond?
>> Did you check the server and mom logs for possible error messages?
>> Did you check /var/log/messages for errors?
>>
>> I hope this helps,
>> Gus Correa
>>
>>
>>> -----Original Message-----
>>> From: torqueusers-bounces at supercluster.org [mailto:torqueusers-bounces at supercluster.org] On Behalf Of Delphine Ramalingom
>>> Sent: Friday, June 15, 2012 5:57 AM
>>> To: Torque Users Mailing List
>>> Subject: Re: [torqueusers] torque 4.0.2
>>>
>>> Dear David,
>>>
>>> I've installed torque 4.0.2, but job stay in queue unless I make a qrun as root.
>>> I've installed the default pbs_sched.
>>> momctl diagnoses that no local jobs detected : that's wrong...
>>>
>>> Have you got an idea what is the problem ? thanks.
>>>
>>> # qstat
>>> Job id                    Name             User            Time Use S Queue
>>> ------------------------- ---------------- --------------- -------- - -----
>>> 29.metis                   ExampleJob       dramalin               0 Q
>>> batch
>>> 32.metis                   ExampleJob       dramalin               0 Q
>>> batch
>>>
>>>
>>> # momctl -h metis.univ.run -d 0
>>>
>>> Host: metis.univ.run/metis.univ.run   Version: 4.0.2   PID: 2807
>>> Server[0]: metis.univ.run (10.90.0.12:15001)
>>>       Last Msg From Server:   281 seconds (DeleteJob)
>>>       Last Msg To Server:     41 seconds
>>> HomeDirectory:          /var/spool/torque/mom_priv
>>> MOM active:             1947 seconds
>>> LogLevel:               0 (use SIGUSR1/SIGUSR2 to adjust)
>>> NOTE:  no local jobs detected
>>>
>>> diagnostics complete
>>>
>>> # momctl -p 15002 -h metis.univ.run -d 3
>>> ERROR:    query[0] 'diag3' failed on metis.univ.run (errno=0 - Success :
>>> 0 - Success)
>>>
>>> delphine
>>>
>>>
>>> Le 13/06/12 20:09, David Beer a écrit :
>>>> Delphine,
>>>>
>>>> This is an issue that is fixed in subsequent releases of 4.0.0. Please
>>>> download 4.0.2:
>>>> http://www.adaptivecomputing.com/resources/downloads/torque/torque-4.0
>>>> .2.tar.gz
>>>> and the problem will be resolved.
>>>>
>>>> David
>>>>
>>> _______________________________________________
>>> torqueusers mailing list
>>> torqueusers at supercluster.org
>>> http://www.supercluster.org/mailman/listinfo/torqueusers
>>> _______________________________________________
>>> torqueusers mailing list
>>> torqueusers at supercluster.org
>>> http://www.supercluster.org/mailman/listinfo/torqueusers
>> _______________________________________________
>> torqueusers mailing list
>> torqueusers at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/torqueusers
>>
>>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>
>
>



More information about the torqueusers mailing list