[torqueusers] "No such process (3) in resi_sum, ###: get_proc_stat"

Kamil Kisiel kamil at zymeworks.com
Mon Jun 23 12:57:20 MDT 2008

On 9-Jun-08, at 14:02 , Kamil Kisiel wrote:

> Occasionally some of our cluster nodes send out a syslog message  
> such as:
> node071.cluster.zymeworks.com pbs_mom: No such process (3) in  
> resi_sum, 797: get_proc_stat
> The number after "resi_sum" is different in each message, presumably  
> it's the PID of some process.
> What does this mean, and what could be causing it?

So far I haven't had any reply to this. Nobody has any clue?

I also noticed that jobs run through MPI are under-reporting the  
cputime used in qstat output. Is that related, or a separate issue?

Kamil Kisiel
HPC Systems Engineer, Zymeworks Inc.
201-1401 West Broadway,
Vancouver, BC, V6H 1H6, Canada
Tel: (604) 678-1388 ext. 135
Fax: (604) 737-7077

Notice of Confidentiality: The information transmitted is intended only for the
person or entity to which it is addressed and may contain confidential and/or
privileged material. Any review, re-transmission, dissemination or other use of
or taking of any action in reliance upon this information by persons or entities
other than the intended recipient is prohibited. If you received this in error
please contact the sender immediately by return electronic transmission and then
immediately delete this transmission including all attachments without copying,
distributing or disclosing the same.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20080623/a934a4d0/attachment.html

More information about the torqueusers mailing list