[torqueusers] torque-2.2.1 pbs_mom cannot write to its log
Bill Marmagas
zorba at vt.edu
Fri Apr 24 10:19:27 MDT 2009
I'm having a problem that started within the last month on some of
our our existing SGI Altix servers, which are all running
torque-2.2.1 pbs_mom. These pbs_mom's are talking to a torque-2.1.8
pbs_server, which is talking to a moab-5.2.1 scheduler. The problem
seems to have started after we added a second Torque server --
running torque-2.3.6 and serving the torque-2.3.6 pbs_mom's of a new
x86-64 cluster -- and configured our main (and only) Moab server to
additionally talk to that second pbs_server. Here is what I've found:
Log Messages
These daemon messages are appearing in the system logs:
pbs_mom: Broken pipe (32) in log_record, PBS cannot write to its log
Symptoms
Log files in /var/spool/torque/mom_logs start out fine but become
corrupt.
Log file does not show up in output of "lsof -p `pidof pbs_mom`" or
even "lsof | grep mom_logs" (it does on the systems that don't have
log file corruption)
Timing
This seems to be initiated sometime after the start of new jobs.
Anyone seen this type of behavior or have ideas?
Thanks
Bill Marmagas
Senior Systems Engineer
Systems Engineering & Administration
Virginia Tech
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20090424/d9c72341/attachment.html
More information about the torqueusers
mailing list