[torqueusers] ComputeNodes' /var/log/messages flooded with "unknown command 5"

Ken Nielson knielson at adaptivecomputing.com
Mon Oct 28 11:37:32 MDT 2013


Oops. Almost forgot.

Thanks for finding that and reporting it.

Regards


On Mon, Oct 28, 2013 at 10:41 AM, Ezell, Matthew A. <ezellma at ornl.gov>wrote:

> I see this message in the mom_log on my desktop where I'm running
> pbs_sched.  I briefly looked into it.
>
> The fifo scheduler of pbs_sched has a talk_with_mom() function that
> contacts each pbs_mom to find out information like ncpus, arch, max_load,
> ideal_load, etc.  This uses the RM protocol on the RM port (the same one
> momctl uses). Back in 2011, commit
> 577e8cb29263075c2d38155e8fc6686b88e0d5af changed the RM protocol, but
> pbs_sched didn't get the memo.  (By the way, are the PBS ERS and IDS still
> being updated?)  A new field was added to the "header" to indicate how
> many commands were coming across the wire.  Since pbs_sched doesn't send
> this, the pbs_mom reads the command as the number of commands and the
> first string as the command.  This happens to be "ncpus", a 5-character
> string.  When read with disrui() instead of diswcs(), you get command #5
> (followed by garbage on the wire).
>
> I'm not sure if the fifo scheduler actually *needs* this information from
> the mom, so it might be OK to just comment out the talk_to_mom() function.
>  If it is needed, then the RM functions in the PBS API need to be updated
> (and potentially some code in the fifo scheduler also).
>
> ~Matt
>
> ---
> Matt Ezell
> HPC Systems Administrator
> Oak Ridge National Laboratory
>
>
>
>
> On 9/5/13 7:09 PM, "David Beer" <dbeer at adaptivecomputing.com> wrote:
>
> >No worries, I was just curious to make sure the rest of it was typed
> >correctly.
> >
> >
> >I don't know of anything that runs momctl - that is usually a user
> >command that has to be run by root. I'm really at a loss for what might
> >cause it to get run and even more for why it'd be getting run with the
> >wrong command.
> >
> >
> >
> >On Thu, Sep 5, 2013 at 4:47 PM, Kamran Khan
> ><kamran at pssclabs.com> wrote:
> >
> >Hi David,
> >
> >
> >
> >Sorry, that was a typo.  I didn't paste it, typed it out.  It does say
> >"rm_request"
> >
> >
> >
> >Where would that command '5' be coming from?  Is there a spot that I can
> >check which runs momctl every 10 seconds or so?
> >
> >
> >
> >Please let me know.
> >
> >
> >
> >Thanks.
> >--
> >Kamran Khan
> >PSSC Labs
> >HPC Software Technical Engineer
> >
> >
> >________________________________________
> >
> >From: "David Beer" <dbeer at adaptivecomputing.com>
> >To: "Torque Users Mailing List" <torqueusers at supercluster.org>
> >Sent: Thursday, September 5, 2013 2:53:13 PM
> >Subject: Re: [torqueusers] ComputeNodes' /var/log/messages flooded with
> >"unknown command 5"
> >
> >
> >
> >Can you be sure you're pasting the exact message from the syslog? I'm
> >just suspicious because that says "rpm_request" when it should say
> >"rm_request." Assuming the rest of it is correct command 5 would mean
> >someone is sending a command '5' via
> > the momctl command which isn't a recognized command.
> >
> >
> >
> >
> >On Thu, Sep 5, 2013 at 3:22 PM, Kamran Khan
> ><kamran at pssclabs.com> wrote:
> >
> >Hi All,
> >
> >I have a HeadNode and (11) ComputeNodes, all configured with Torque.
> >
> >On the ComputeNodes, the /var/log/messages files are being flooded every
> >10 seconds with the following message:
> >
> >n001 pbs_mom: LOG_ERROR: :rpm_request, unknown command 5
> >
> >
> >So far as I can tell, I am having no problems running any jobs through
> >Torque, but this cluster is for a customer who may see the logs and start
> >freaking out.  Is this a common error?  Is there anyway to get rid of
> >these messages?
> >
> >Any help would be appreciated.
> >
> >Thanks.
> >--
> >Kamran Khan
> >PSSC Labs
> >HPC Software Technical Engineer
> >
> >_______________________________________________
> >torqueusers mailing list
> >torqueusers at supercluster.org
> >http://www.supercluster.org/mailman/listinfo/torqueusers
> >
> >
> >
> >
> >
> >
> >--
> >David Beer | Senior Software Engineer
> >Adaptive Computing
> >
> >
> >_______________________________________________
> >torqueusers mailing list
> >torqueusers at supercluster.org
> >http://www.supercluster.org/mailman/listinfo/torqueusers
> >
> >
> >
> >
> >
> >
> >
> >
> >_______________________________________________
> >torqueusers mailing list
> >torqueusers at supercluster.org
> >http://www.supercluster.org/mailman/listinfo/torqueusers
> >
> >
> >
> >
> >
> >
> >
> >
> >--
> >David Beer | Senior Software Engineer
> >Adaptive Computing
> >
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>



-- 
Ken Nielson
+1 801.717.3700 office +1 801.717.3738 fax
1712 S. East Bay Blvd, Suite 300  Provo, UT  84606
www.adaptivecomputing.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20131028/5e6cda7a/attachment-0001.html 


More information about the torqueusers mailing list