[torqueusers] pbs_mom and remote logging

Andrus, Mr. Brian (Contractor) brian.andrus at nrlmry.navy.mil
Wed Sep 26 12:08:56 MDT 2007


Ok, I can not seem to get pbs_mom to log everything to the remote syslog
server.

The ONLY things I see on the remote loghost are startups and shutdowns:
-------------------snip-----------------
Sep 26 11:02:17 n1 pbs_mom: shutdown succeeded
Sep 26 11:02:18 n1 pbs_mom: pbs_mom startup succeeded
-------------------snip-----------------

Whereas the log on the node has plenty of info:
-------------------snip-----------------
09/26/2007 11:05:17;0001;   pbs_mom;Job;job_nodes;job:
2180.cluster0.default.domain numnodes=8 numvnod=8
09/26/2007 11:05:17;0002;   pbs_mom;n/a;run_pelog;prolog script
'/var/torque/mom_priv/prologue.parallel' does not exist (cwd:
/var/torque/mom_priv)
09/26/2007 11:05:17;0002;   pbs_mom;n/a;run_pelog;userprolog script
'/var/torque/mom_priv/prologue.user.parallel' does not exist (cwd:
/var/torque/mom_priv)
09/26/2007 11:05:17;0008;
pbs_mom;Job;2180.cluster0.default.domain;JOIN JOB as node 7
09/26/2007 11:05:17;0008;
pbs_mom;Job;2180.cluster0.default.domain;evaluating limits for job
09/26/2007 11:05:17;0008;   pbs_mom;Job;do_rpp;got an internal task
manager request in do_rpp
09/26/2007 11:05:17;0002;   pbs_mom;Svr;im_request;connect from
192.168.0.8:1023
09/26/2007 11:05:17;0008;
pbs_mom;Job;2180.cluster0.default.domain;received request 'SPAWN_TASK'
from 192.168.0.8:1023
09/26/2007 11:05:17;0008;
pbs_mom;Job;2180.cluster0.default.domain;INFO:     received request
'SPAWN_TASK' from 192.168.0.8:1023 for job
'2180.cluster0.default.domain' (spawning task on node '0' with taskid=9,
globid='none'
09/26/2007 11:05:17;0008;
pbs_mom;Job;2180.cluster0.default.domain;saving task (IM_SPAWN_TASK)
09/26/2007 11:05:17;0008;   pbs_mom;Svr;task_save;saving task in
/var/torque/mom_priv/jobs/2180.cluste.TK/0000000009
09/26/2007 11:05:17;0002;   pbs_mom;n/a;mom_close_poll;entered
09/26/2007 11:05:17;0001;
pbs_mom;Job;2180.cluster0.default.domain;task set to running/saving task
(start_process)
09/26/2007 11:05:17;0008;   pbs_mom;Svr;task_save;saving task in
/var/torque/mom_priv/jobs/2180.cluste.TK/0000000009
09/26/2007 11:05:17;0008;
pbs_mom;Job;2180.cluster0.default.domain;start_process: task started,
tid 9, sid 3001, cmd orted
09/26/2007 11:05:18;0002;   pbs_mom;n/a;cput_sum;cput_sum: session=3001
pid=3001 cputime=0 (cputfactor=1.000000)
09/26/2007 11:05:18;0008;   pbs_mom;Job;scan_for_terminated;for job
2180.cluster0.default.domain, task 9, pid=3001, exitcode=0
09/26/2007 11:05:18;0008;
pbs_mom;Job;2180.cluster0.default.domain;sending signal 9 to task
09/26/2007 11:05:18;0008;   pbs_mom;Svr;task_save;saving task in
/var/torque/mom_priv/jobs/2180.cluste.TK/0000000009
09/26/2007 11:05:18;0080;
pbs_mom;Job;2180.cluster0.default.domain;scan_for_terminated: job
2180.cluster0.default.domain task 9 terminated, sid 3001
09/26/2007 11:05:18;0008;   pbs_mom;Svr;task_save;saving task in
/var/torque/mom_priv/jobs/2180.cluste.TK/0000000009
09/26/2007 11:05:20;0008;   pbs_mom;Job;do_rpp;got an internal task
manager request in do_rpp
09/26/2007 11:05:20;0002;   pbs_mom;Svr;im_request;connect from
192.168.0.8:1023
09/26/2007 11:05:20;0008;
pbs_mom;Job;2180.cluster0.default.domain;received request 'POLL_JOB'
from 192.168.0.8:1023
09/26/2007 11:05:21;0002;   pbs_mom;n/a;is_update_stat;composing status
update for server
09/26/2007 11:05:21;0002;   pbs_mom;n/a;is_update_stat;setting alarm in
is_update_stat
09/26/2007 11:05:21;0002;   pbs_mom;n/a;is_update_stat;is_update_stat:
sending to server "opsys=linux"
09/26/2007 11:05:21;0002;   pbs_mom;n/a;is_update_stat;setting alarm in
is_update_stat
09/26/2007 11:05:21;0002;   pbs_mom;n/a;is_update_stat;is_update_stat:
sending to server "uname=Linux n1 2.6.9-55.0.6.ELsmp #1 SMP Thu Aug 23
11:13:21 EDT 2007 x86_64"
09/26/2007 11:05:21;0002;   pbs_mom;n/a;is_update_stat;setting alarm in
is_update_stat
09/26/2007 11:05:21;0002;   pbs_mom;n/a;is_update_stat;is_update_stat:
sending to server "sessions=? 15201"
09/26/2007 11:05:21;0002;   pbs_mom;n/a;is_update_stat;setting alarm in
is_update_stat
09/26/2007 11:05:21;0002;   pbs_mom;n/a;is_update_stat;is_update_stat:
sending to server "nsessions=? 15201"
09/26/2007 11:05:21;0002;   pbs_mom;n/a;is_update_stat;setting alarm in
is_update_stat
09/26/2007 11:05:21;0002;   pbs_mom;n/a;is_update_stat;is_update_stat:
sending to server "nusers=0"
09/26/2007 11:05:21;0002;   pbs_mom;n/a;is_update_stat;setting alarm in
is_update_stat
09/26/2007 11:05:21;0002;   pbs_mom;n/a;is_update_stat;is_update_stat:
sending to server "idletime=801"
09/26/2007 11:05:21;0002;   pbs_mom;n/a;is_update_stat;setting alarm in
is_update_stat
09/26/2007 11:05:21;0002;   pbs_mom;n/a;totmem;totmem: total
mem=8548777984
09/26/2007 11:05:21;0002;   pbs_mom;n/a;is_update_stat;is_update_stat:
sending to server "totmem=8348416kb"
09/26/2007 11:05:21;0002;   pbs_mom;n/a;is_update_stat;setting alarm in
is_update_stat
09/26/2007 11:05:21;0002;   pbs_mom;n/a;availmem;availmem: free
mem=7080996864
09/26/2007 11:05:21;0002;   pbs_mom;n/a;is_update_stat;is_update_stat:
sending to server "availmem=6915036kb"
09/26/2007 11:05:21;0002;   pbs_mom;n/a;is_update_stat;setting alarm in
is_update_stat
09/26/2007 11:05:21;0002;   pbs_mom;n/a;is_update_stat;is_update_stat:
sending to server "physmem=2050944kb"
09/26/2007 11:05:21;0002;   pbs_mom;n/a;is_update_stat;setting alarm in
is_update_stat
09/26/2007 11:05:21;0002;   pbs_mom;n/a;is_update_stat;is_update_stat:
sending to server "ncpus=2"
09/26/2007 11:05:21;0002;   pbs_mom;n/a;is_update_stat;setting alarm in
is_update_stat
09/26/2007 11:05:21;0002;   pbs_mom;n/a;is_update_stat;is_update_stat:
sending to server "loadave=1.00"
09/26/2007 11:05:21;0002;   pbs_mom;n/a;is_update_stat;setting alarm in
is_update_stat
09/26/2007 11:05:21;0002;   pbs_mom;n/a;is_update_stat;setting alarm in
is_update_stat
09/26/2007 11:05:21;0002;   pbs_mom;n/a;is_update_stat;setting alarm in
is_update_stat
09/26/2007 11:05:21;0002;   pbs_mom;n/a;is_update_stat;is_update_stat:
sending to server "netload=1642141202"
09/26/2007 11:05:21;0002;   pbs_mom;n/a;is_update_stat;setting alarm in
is_update_stat
09/26/2007 11:05:21;0002;   pbs_mom;n/a;is_update_stat;is_update_stat:
sending to server "state=free"
09/26/2007 11:05:21;0002;   pbs_mom;n/a;is_update_stat;setting alarm in
is_update_stat
09/26/2007 11:05:21;0002;   pbs_mom;n/a;is_update_stat;is_update_stat:
sending to server "jobs=2180.cluster0.default.domain"
09/26/2007 11:05:21;0002;   pbs_mom;n/a;is_update_stat;status update
successfully sent to cluster0
09/26/2007 11:05:23;0008;   pbs_mom;Job;do_rpp;got an internal task
manager request in do_rpp
09/26/2007 11:05:23;0002;   pbs_mom;Svr;im_request;connect from
192.168.0.8:1023
09/26/2007 11:05:23;0008;
pbs_mom;Job;2180.cluster0.default.domain;received request 'KILL_JOB'
from 192.168.0.8:1023
09/26/2007 11:05:23;0100;
pbs_mom;Job;2180.cluster0.default.domain;kill_job received
09/26/2007 11:05:23;0008;
pbs_mom;Job;2180.cluster0.default.domain;im_request: KILL_JOB
2180.cluster0.default.domain node 192.168.0.8:1023
09/26/2007 11:05:23;0008;
pbs_mom;Job;2180.cluster0.default.domain;kill_job
09/26/2007 11:05:23;0002;   pbs_mom;n/a;run_pelog;userepilog script
'/var/torque/mom_priv/epilogue.precancel' does not exist (cwd:
/var/torque/mom_priv)
09/26/2007 11:05:23;0008;
pbs_mom;Job;2180.cluster0.default.domain;kill_job done
09/26/2007 11:05:23;0002;   pbs_mom;n/a;run_pelog;epilog script
'/var/torque/mom_priv/epilogue.parallel' does not exist (cwd:
/var/torque/mom_priv)
09/26/2007 11:05:23;0002;   pbs_mom;n/a;run_pelog;userepilog script
'/var/torque/mom_priv/epilogue.user.parallel' does not exist (cwd:
/var/torque/mom_priv)
09/26/2007 11:05:23;0008;   pbs_mom;Job;2180.cluster0.default.domain;all
tasks complete - purging job as sister
09/26/2007 11:05:23;0080;
pbs_mom;Job;2180.cluster0.default.domain;removing job
09/26/2007 11:05:23;0080;
pbs_mom;Job;2180.cluster0.default.domain;removed job file
-------------------snip-----------------

Is there something special to set in the mom_priv/config to ensure all
messages are sent to syslog?

Brian Andrus perotsystems 
Site Manager | Sr. Computer Scientist 
Naval Research Lab
7 Grace Hopper Ave, Monterey, CA  93943
Phone (831) 656-4839 | Fax (831) 656-4866 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20070926/87463251/attachment.html


More information about the torqueusers mailing list