[torqueusers] torque-2.2.1 pbs_mom cannot write to its log

Bill Marmagas zorba at vt.edu
Fri Apr 24 10:19:27 MDT 2009


I'm having a problem that started within the last month on some of  
our our existing SGI Altix servers, which are all running  
torque-2.2.1 pbs_mom.  These pbs_mom's are talking to a torque-2.1.8  
pbs_server, which is talking to a moab-5.2.1 scheduler.  The problem  
seems to have started after we added a second Torque server --  
running torque-2.3.6 and serving the torque-2.3.6 pbs_mom's of a new  
x86-64 cluster -- and configured our main (and only) Moab server to  
additionally talk to that second pbs_server.  Here is what I've found:


Log Messages

These daemon messages are appearing in the system logs:

pbs_mom: Broken pipe (32) in log_record, PBS cannot write to its log


Symptoms

Log files in /var/spool/torque/mom_logs start out fine but become  
corrupt.

Log file does not show up in output of "lsof -p `pidof pbs_mom`" or  
even "lsof | grep mom_logs" (it does on the systems that don't have  
log file corruption)


Timing

This seems to be initiated sometime after the start of new jobs.


Anyone seen this type of behavior or have ideas?


Thanks




Bill Marmagas
Senior Systems Engineer
Systems Engineering & Administration
Virginia Tech


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20090424/d9c72341/attachment.html 


More information about the torqueusers mailing list