[torqueusers] Torque-2.1.6 Problem with pbs_mom logging - get_proc_stat

Garrick Staples garrick at usc.edu
Tue Jul 3 09:13:21 MDT 2007


On Tue, Jul 03, 2007 at 09:43:52AM -0400, Bill Wichser alleged:
> As a followup, I'm also seeing many syslog entries like this as well 
> (from multiple hosts):
> 
> pbs_mom: Success (0) in cput_sum, 6426: get_proc_stat
> pbs_mom: Success (0) in cput_sum, 6779: get_proc_stat
> pbs_mom: Inappropriate ioctl for device (25) in mem_sum, 6779: get_proc_stat
> pbs_mom: Inappropriate ioctl for device (25) in mem_sum, 6426: get_proc_stat
> pbs_mom: Success (0) in cput_sum, 6221: get_proc_stat
> pbs_mom: Success (0) in cput_sum, 6247: get_proc_stat
> pbs_mom: Inappropriate ioctl for device (25) in resi_sum, 6779: 
> get_proc_stat
> pbs_mom: Success (0) in cput_sum, 5836: get_proc_stat
> pbs_mom: Success (0) in cput_sum, 5645: get_proc_stat
> pbs_mom: Success (0) in cput_sum, 5562: get_proc_stat
> pbs_mom: Success (0) in cput_sum, 6267: get_proc_stat
> pbs_mom: Success (0) in cput_sum, 6905: get_proc_stat
> pbs_mom: Success (0) in cput_sum, 6293: get_proc_stat
> pbs_mom: Success (0) in cput_sum, 6284: get_proc_stat
> pbs_mom: Inappropriate ioctl for device (25) in mem_sum, 6221: get_proc_stat
> pbs_mom: Inappropriate ioctl for device (25) in mem_sum, 6247: get_proc_stat
> pbs_mom: Inappropriate ioctl for device (25) in mem_sum, 5836: get_proc_stat
> 
> Again, only since a kernel update have we been seeing these log entries. 
>  Is it possible that something else is logging these events and not the 
> pbs_mom directly?
> 
> An strace on the syslog daemon finds this:
> select(1, [0], NULL, NULL, NULL)        = 1 (in [0])
> recvfrom(0, "<27>Jul  3 09:39:42 pbs_mom: Suc"..., 1022, 0, NULL, NULL) = 73
> writev(1, [{"Jul  3 09:39:42", 15}, {" ", 1}, {"woodhen-004", 11}, {" ", 
> 1}, {"p
> bs_mom: Success (0) in cput_sum"..., 53}, {"\n", 1}], 6) = 82
> fsync(1)                                = 0
> 
> Thanks, Bill
> 
> Bill Wichser wrote:
> >Since upgrading this morning to a new kernel (2.6.9-55.0.2.ELsmp) and IB 
> >drivers, the pbs_mom on the nodes have been logging (to syslog) a 
> >constant barrage of successes.

In get_proc_stat(), pbs_mom has to read and parse /proc/$pid/stat.  This
isn't a complicated process, but it is sensitive to kernel changes.
However, I have 2.6.9-55.0.2.ELsmp rolling out to my own nodes without
these problems.

Can you try pbs_mom without any extra kernel modules, e.g., the IB
driver?  Can you send us /proc/$$/stat so I can compare with my own?

-- 
Garrick Staples, GNU/Linux HPCC SysAdmin
University of Southern California

Please avoid sending me Word or PowerPoint attachments.
See http://www.gnu.org/philosophy/no-word-attachments.html
09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20070703/7700bb26/attachment.bin


More information about the torqueusers mailing list