[torqueusers] qstat showing wrong
Garrick Staples
garrick at usc.edu
Wed Aug 24 19:56:18 MDT 2005
On Wed, Aug 24, 2005 at 12:08:09PM -0500, Laurence Dawson alleged:
> We are running torque 1.2.0p5 on our cluster. qstat is showing all jobs
> with very low cputimes (0 seconds up to about 17 seconds). An sample
> extract from qstat is pasted below These times are clearly incorrect,
> the jobs are a mix of single and multiple cpu jobs. No jobs are being
> recorded correctly according to qstat. See the logs below for details -
Before I possibly go into a long explanation, I just want to double
check real quick... all of the processes in the job are children of
pbs_mom? The jobs aren't using rsh/ssh to launch processes?
> but is the problem related to this message in th e momlog below?:
> pbs_mom;Svr;pbs_mom;No child processes (10) in is_update_stat, cannot
> specify protocol
No, that means MOM had a problem sending one of the periodic node stat
messages. Those messages don't contain job information.
As long as it doesn't happen all the time, it is fine.
--
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20050824/2b66d9d5/attachment-0001.bin
More information about the torqueusers
mailing list