[torqueusers] qstat showing wrong

Garrick Staples garrick at usc.edu
Wed Aug 24 19:56:18 MDT 2005


On Wed, Aug 24, 2005 at 12:08:09PM -0500, Laurence Dawson alleged:
> We are running torque 1.2.0p5 on our cluster. qstat is showing all jobs 
> with very low cputimes (0 seconds up to about 17 seconds). An sample 
> extract from qstat is pasted below These times are clearly incorrect, 
> the jobs are a mix of single and multiple cpu jobs. No jobs are being 
> recorded correctly according to qstat. See the logs below for details - 

Before I possibly go into a long explanation, I just want to double
check real quick... all of the processes in the job are children of
pbs_mom?  The jobs aren't using rsh/ssh to launch processes?


> but is the problem related to this message in th e momlog below?:
> pbs_mom;Svr;pbs_mom;No child processes (10) in is_update_stat, cannot 
> specify protocol

No, that means MOM had a problem sending one of the periodic node stat
messages.  Those messages don't contain job information.

As long as it doesn't happen all the time, it is fine.

-- 
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20050824/2b66d9d5/attachment-0001.bin


More information about the torqueusers mailing list