[torqueusers] Wrong number of allocated nodes
Regina Guilabert Canals
regina.guilabert at uib.es
Fri Jul 27 02:57:41 MDT 2007
Thanks Paul,
No other processes are running for the user. Anyway, we removed this
limitation from the queue and the problem persists.
Tracing a job with tracejob we see that exec_host is not correctly set:
megacelula:/megadisk/people/regina# tracejob 2199
/var/spool/torque/mom_logs/20070727: No matching job records located
Job: 2199.megacelula
07/27/2007 10:45:14 S enqueuing into batch, state 1 hop 1
07/27/2007 10:45:14 S Job Queued at request of
dfsvhs9 at megacelula, owner = dfsvhs9 at megacelula, job name = TEST,
queue = batch
07/27/2007 10:45:14 S Job Modified at request of
Scheduler at megacelula
07/27/2007 10:45:14 S Job Run at request of Scheduler at megacelula
07/27/2007 10:45:14 A queue=batch
07/27/2007 10:45:15 L Job Run
07/27/2007 10:45:15 A user=dfsvhs9 group=models jobname=TEST
queue=batch ctime=1185525914 qtime=1185525914 etime=1185525914
start=1185525915 exec_host=cell14/1+cell14/0
Resource_List.neednodes=10:ppn=2
Resource_List.nodect=10 Resource_List.nodes=10:ppn=2
Resource_List.walltime=00:20:30
Who sets "exec_host" pbs_server or pbs_sched? and how can we track
the error?
Regina Guilabert Canals
Grup de Meteorologia
Edif. Mateu Orfila Tel: +34 971 17 3213
Universitat de les Illes Balears Fax: +34 971 17 3426
07122 Palma de Mallorca (Spain) email: regina.guilabert at uib.es
El 26/07/2007, a las 16:53, Paul Gray escribió:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On Thu, Jul 26, 2007 at 12:16:49PM +0200, Regina Guilabert Canals
> wrote:
>> Dear TORQUE users,
>>
>> Without any apparent reason PBS stop allocating the correct number of
>> nodes yesterday. Now, when we request, for instance, 4 nodes, the job
>> only gets 1 node assigned.
>>
>> Let me illustrate it with an example:
>>
>
> I had the same issue, and the cause was the queue limit on max user
> processes.
> Your "batch" queue has a limit of 5 processes, was another process
> running?
>
> - --
> Paul Gray -o)
> 314 East Gym, Dept. of Computer Science /\\
> University of Northern Iowa _\_V
> Message void if penguin violated ... Don't mess with the penguin
> No one says, "Hey, I can't read that ASCII attachment ya sent me."
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.6 (GNU/Linux)
>
> iD8DBQFGqLVyOH45TZW7mh4RAqWSAKCogdqzimdCtO7qzP08XJIVBvPRDgCeNUH0
> CXOFY1nS0glk2Y+iSn6Vzx4=
> =DE+1
> -----END PGP SIGNATURE-----
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20070727/613f5c3a/attachment-0001.html
More information about the torqueusers
mailing list