[torqueusers] Torque 2.4.9 - Could not create cpuset (Was: Re: Torque 2.4.9 - job reported idle at time)

torqueusers at calcua.ua.ac.be torqueusers at calcua.ua.ac.be
Wed Jul 28 02:11:57 MDT 2010


On Tue, 27 Jul 2010, Ken Nielson wrote:

> This might be something to look at. It appears job 19000 is failing with 
> an exit status of -3. This job is on the machines in the hostlist. The 
> job is then set to rerun.

I have set the loglevel of pbs_mom to 7 on the machine the job will run on 
(using the -l option of qsub).  In the logfile, I see "state return 
code=-3" coming up at 09:33:50, immediately being followed by a "job not 
started" message.

Looking at syslog, I now find the following problem:

   Jul 28 10:06:32 cn090 pbs_mom: LOG_ERROR::TMomFinalizeChild, Could not
   create cpuset for job 19007.ourmachine.com.

   Jul 28 10:06:32 cn090 pbs_mom: LOG_ERROR::No such file or directory (2)
   in open_std_file, cannot open/create stdout/stderr file
   '/var/spool/torque/spool/19007.master1.turing.antwerpen.vsc.OU' (mode:
   2001, keeping: FALSE)

   Jul 28 10:06:32 cn090 pbs_mom: LOG_ERROR::No such file or directory (2)
   in open_std_file, cannot open/create stdout/stderr file
   '/var/spool/torque/spool/19007.master1.turing.antwerpen.vsc.ER' (mode:
   2001, keeping: FALSE)

Indeed, I used the --enable-cpuset option when configuring Torque.  So the 
question now is why cpuset is not working.  Any ideas?

-- Regards,

Franky



</begin logfile>
07/28/2010 09:33:10;0002;   pbs_mom;Svr;pbs_mom;received signal 10: adjusting loglevel to 1
07/28/2010 09:33:11;0002;   pbs_mom;Svr;pbs_mom;received signal 10: adjusting loglevel to 2
07/28/2010 09:33:12;0002;   pbs_mom;Svr;pbs_mom;received signal 10: adjusting loglevel to 3
07/28/2010 09:33:13;0002;   pbs_mom;Svr;pbs_mom;received signal 10: adjusting loglevel to 4
07/28/2010 09:33:13;0002;   pbs_mom;Svr;pbs_mom;received signal 10: adjusting loglevel to 5
07/28/2010 09:33:13;0002;   pbs_mom;Svr;pbs_mom;received signal 10: adjusting loglevel to 6
07/28/2010 09:33:14;0002;   pbs_mom;Svr;pbs_mom;received signal 10: adjusting loglevel to 7
07/28/2010 09:33:16;0002;   pbs_mom;n/a;rm_request;internal diagnostics complete
07/28/2010 09:33:17;0002;   pbs_mom;n/a;mom_server_all_update_stat;composing status update for server
07/28/2010 09:33:17;0002;   pbs_mom;n/a;sessions;sessions[0]: pid 3519 sid 3518
07/28/2010 09:33:17;0002;   pbs_mom;n/a;sessions;sessions[1]: pid 26539 sid 26538
07/28/2010 09:33:17;0002;   pbs_mom;n/a;sessions;sessions[0]: pid 3519 sid 3518
07/28/2010 09:33:17;0002;   pbs_mom;n/a;sessions;sessions[1]: pid 26539 sid 26538
07/28/2010 09:33:17;0002;   pbs_mom;n/a;nusers;nusers[0]: pid 3519 uid 1016
07/28/2010 09:33:17;0002;   pbs_mom;n/a;nusers;nusers[1]: pid 26539 uid 1016
07/28/2010 09:33:17;0002;   pbs_mom;n/a;totmem;totmem: total mem=16836907008
07/28/2010 09:33:17;0002;   pbs_mom;n/a;availmem;availmem: free mem=15443214336
07/28/2010 09:33:17;0002;   pbs_mom;n/a;mom_server_update_stat;mom_server_update_stat: sending to server "opsys=linux"
07/28/2010 09:33:17;0002;   pbs_mom;n/a;mom_server_update_stat;mom_server_update_stat: sending to server "uname=Linux cn090 2.6.18-164.6.1.el5 #1 SMP Tue Nov 3 23:02:51 EST 2009 x86_64"
07/28/2010 09:33:17;0002;   pbs_mom;n/a;mom_server_update_stat;mom_server_update_stat: sending to server "sessions=3518 26538"
07/28/2010 09:33:17;0002;   pbs_mom;n/a;mom_server_update_stat;mom_server_update_stat: sending to server "nsessions=2"
07/28/2010 09:33:17;0002;   pbs_mom;n/a;mom_server_update_stat;mom_server_update_stat: sending to server "nusers=1"
07/28/2010 09:33:17;0002;   pbs_mom;n/a;mom_server_update_stat;mom_server_update_stat: sending to server "idletime=3523183"
07/28/2010 09:33:17;0002;   pbs_mom;n/a;mom_server_update_stat;mom_server_update_stat: sending to server "totmem=16442292kb"
07/28/2010 09:33:17;0002;   pbs_mom;n/a;mom_server_update_stat;mom_server_update_stat: sending to server "availmem=15081264kb"
07/28/2010 09:33:17;0002;   pbs_mom;n/a;mom_server_update_stat;mom_server_update_stat: sending to server "physmem=16442292kb"
07/28/2010 09:33:17;0002;   pbs_mom;n/a;mom_server_update_stat;mom_server_update_stat: sending to server "ncpus=8"
07/28/2010 09:33:17;0002;   pbs_mom;n/a;mom_server_update_stat;mom_server_update_stat: sending to server "loadave=0.00"
07/28/2010 09:33:17;0002;   pbs_mom;n/a;mom_server_update_stat;mom_server_update_stat: sending to server "gres="
07/28/2010 09:33:17;0002;   pbs_mom;n/a;mom_server_update_stat;mom_server_update_stat: sending to server "netload=4033950944159"
07/28/2010 09:33:17;0002;   pbs_mom;n/a;mom_server_update_stat;mom_server_update_stat: sending to server "state=free"
07/28/2010 09:33:17;0002;   pbs_mom;n/a;mom_server_update_stat;mom_server_update_stat: sending to server "jobs= "
07/28/2010 09:33:17;0002;   pbs_mom;n/a;mom_server_update_stat;mom_server_update_stat: sending to server "varattr= "
07/28/2010 09:33:17;0002;   pbs_mom;n/a;mom_server_update_stat;status update successfully sent to master1.ourmachine.com
07/28/2010 09:33:50;0080;   pbs_mom;Req;dis_request_read;decoding command QueueJob from PBS_Server
07/28/2010 09:33:50;0100;   pbs_mom;Req;;Type QueueJob request received from PBS_Server at master1.ourmachine.com, sock=10
07/28/2010 09:33:50;0008;   pbs_mom;Job;process_request;request type QueueJob from host master1.ourmachine.com received
07/28/2010 09:33:50;0008;   pbs_mom;Job;process_request;request type QueueJob from host master1.ourmachine.com allowed
07/28/2010 09:33:50;0008;   pbs_mom;Job;dispatch_request;dispatching request QueueJob on sd=10
07/28/2010 09:33:50;0080;   pbs_mom;Req;dis_request_read;decoding command JobScript from PBS_Server
07/28/2010 09:33:50;0100;   pbs_mom;Req;;Type JobScript request received from PBS_Server at master1.ourmachine.com, sock=10
07/28/2010 09:33:50;0008;   pbs_mom;Job;process_request;request type JobScript from host master1.ourmachine.com received
07/28/2010 09:33:50;0008;   pbs_mom;Job;process_request;request type JobScript from host master1.ourmachine.com allowed
07/28/2010 09:33:50;0008;   pbs_mom;Job;dispatch_request;dispatching request JobScript on sd=10
07/28/2010 09:33:50;0080;   pbs_mom;Req;dis_request_read;decoding command ReadyToCommit from PBS_Server
07/28/2010 09:33:50;0100;   pbs_mom;Req;;Type ReadyToCommit request received from PBS_Server at master1.ourmachine.com, sock=10
07/28/2010 09:33:50;0008;   pbs_mom;Job;process_request;request type ReadyToCommit from host master1.ourmachine.com received
07/28/2010 09:33:50;0008;   pbs_mom;Job;process_request;request type ReadyToCommit from host master1.ourmachine.com allowed
07/28/2010 09:33:50;0008;   pbs_mom;Job;dispatch_request;dispatching request ReadyToCommit on sd=10
07/28/2010 09:33:50;0008;   pbs_mom;Job;19007.master1.ourmachine.com;ready to commit job
07/28/2010 09:33:50;0008;   pbs_mom;Job;19007.master1.ourmachine.com;ready to commit job completed
07/28/2010 09:33:50;0080;   pbs_mom;Req;dis_request_read;decoding command Commit from PBS_Server
07/28/2010 09:33:50;0100;   pbs_mom;Req;;Type Commit request received from PBS_Server at master1.ourmachine.com, sock=10
07/28/2010 09:33:50;0008;   pbs_mom;Job;process_request;request type Commit from host master1.ourmachine.com received
07/28/2010 09:33:50;0008;   pbs_mom;Job;process_request;request type Commit from host master1.ourmachine.com allowed
07/28/2010 09:33:50;0008;   pbs_mom;Job;dispatch_request;dispatching request Commit on sd=10
07/28/2010 09:33:50;0008;   pbs_mom;Job;19007.master1.ourmachine.com;committing job
07/28/2010 09:33:50;0008;   pbs_mom;Job;19007.master1.ourmachine.com;starting job execution
07/28/2010 09:33:50;0001;   pbs_mom;Job;job_nodes;0: cn090.ourmachine.com/0
07/28/2010 09:33:50;0001;   pbs_mom;Job;job_nodes;job: 19007.master1.ourmachine.com numnodes=1 numvnod=1
07/28/2010 09:33:50;0001;   pbs_mom;Svr;pbs_mom;LOG_DEBUG::init_groups, pre-sigprocmask
07/28/2010 09:33:50;0001;   pbs_mom;Svr;pbs_mom;LOG_DEBUG::init_groups, post-initgroups
07/28/2010 09:33:50;0001;   pbs_mom;Svr;pbs_mom;LOG_DEBUG::mom_checkpoint_job_has_checkpoint, FALSE
07/28/2010 09:33:50;0008;   pbs_mom;Job;19007.master1.ourmachine.com;evaluating limits for job
07/28/2010 09:33:50;0001;   pbs_mom;Job;19007.master1.ourmachine.com;about to fork child which will become job
07/28/2010 09:33:50;0001;   pbs_mom;Job;TMomFinalizeJob2;job: 19007.master1.ourmachine.com numnodes=1 numvnod=1
07/28/2010 09:33:50;0002;   pbs_mom;n/a;mom_close_poll;entered
07/28/2010 09:33:50;0001;   pbs_mom;Job;19007.master1.ourmachine.com;phase 2 of job launch successfully completed
07/28/2010 09:33:50;0001;   pbs_mom;Job;19007.master1.ourmachine.com;task/session info loaded
07/28/2010 09:33:50;0001;   pbs_mom;Job;TMomFinalizeJob3;Job 19007.master1.ourmachine.com read start return code=-3 session=0
07/28/2010 09:33:50;0001;   pbs_mom;Job;TMomFinalizeJob3;job not started, Retry job exec failure, retry will be attempted (see syslog for more information)
07/28/2010 09:33:50;0008;   pbs_mom;Req;send_sisters;sending command ABORT_JOB for job 19007.master1.ourmachine.com (10)
07/28/2010 09:33:50;0008;   pbs_mom;Req;send_sisters;sending ABORT to sisters for job 19007.master1.ourmachine.com
07/28/2010 09:33:50;0008;   pbs_mom;Job;19007.master1.ourmachine.com;job execution started
07/28/2010 09:33:50;0008;   pbs_mom;Job;19007.master1.ourmachine.com;start failed on unknown node
07/28/2010 09:33:50;0008;   pbs_mom;Job;scan_for_terminated;entered
07/28/2010 09:33:50;0080;   pbs_mom;Svr;mom_get_sample;proc_array load started
07/28/2010 09:33:50;0080;   pbs_mom;n/a;mom_get_sample;proc_array loaded - nproc=151
07/28/2010 09:33:50;0080;   pbs_mom;n/a;cput_sum;proc_array loop start - jobid = 19007.master1.ourmachine.com
07/28/2010 09:33:50;0080;   pbs_mom;n/a;mem_sum;proc_array loop start - jobid = 19007.master1.ourmachine.com
07/28/2010 09:33:50;0080;   pbs_mom;n/a;resi_sum;proc_array loop start - jobid = 19007.master1.ourmachine.com
07/28/2010 09:33:50;0080;   pbs_mom;Job;19007.master1.ourmachine.com;checking job w/subtask pid=0 (child pid=1491)
07/28/2010 09:33:50;0008;   pbs_mom;Job;scan_for_terminated;pid 1491 not tracked, statloc=65024, exitval=254
07/28/2010 09:33:50;0080;   pbs_mom;Svr;scan_for_exiting;searching for exiting jobs
07/28/2010 09:33:50;0008;   pbs_mom;Job;kill_job;scan_for_exiting: sending signal 9, "KILL" to job 19007.master1.ourmachine.com, reason: local task termination detected
07/28/2010 09:33:50;0002;   pbs_mom;n/a;run_pelog;userepilog script '/var/spool/torque/mom_priv/epilogue.precancel' for job 19007.master1.ourmachine.com does not exist (cwd: /var/spool/torque/mom_priv,pid: 24782)
07/28/2010 09:33:50;0008;   pbs_mom;Job;19007.master1.ourmachine.com;kill_job done (killed 0 processes)
07/28/2010 09:33:50;0080;   pbs_mom;Job;19007.master1.ourmachine.com;sending preobit jobstat
07/28/2010 09:33:50;0080;   pbs_mom;Req;dis_request_read;decoding command Disconnect from PBS_Server
07/28/2010 09:33:50;0080;   pbs_mom;Req;dis_request_read;decoding command StatusJob from PBS_Server
07/28/2010 09:33:50;0100;   pbs_mom;Req;;Type StatusJob request received from PBS_Server at master1.ourmachine.com, sock=13
07/28/2010 09:33:50;0008;   pbs_mom;Job;process_request;request type StatusJob from host master1.ourmachine.com received
07/28/2010 09:33:50;0008;   pbs_mom;Job;process_request;request type StatusJob from host master1.ourmachine.com allowed
07/28/2010 09:33:50;0008;   pbs_mom;Job;dispatch_request;dispatching request StatusJob on sd=13
07/28/2010 09:33:50;0080;   pbs_mom;Req;dis_request_read;decoding command ModifyJob from PBS_Server
07/28/2010 09:33:50;0100;   pbs_mom;Req;;Type ModifyJob request received from PBS_Server at master1.ourmachine.com, sock=10
07/28/2010 09:33:50;0008;   pbs_mom;Job;process_request;request type ModifyJob from host master1.ourmachine.com received
07/28/2010 09:33:50;0008;   pbs_mom;Job;process_request;request type ModifyJob from host master1.ourmachine.com allowed
07/28/2010 09:33:50;0008;   pbs_mom;Job;dispatch_request;dispatching request ModifyJob on sd=10
07/28/2010 09:33:50;0008;   pbs_mom;Job;19007.master1.ourmachine.com;modifying job
07/28/2010 09:33:50;0008;   pbs_mom;Job;19007.master1.ourmachine.com;modifying type 6 attribute resource of job (value: 'RESC')
07/28/2010 09:33:50;0002;   pbs_mom;n/a;mom_set_limits;mom_set_limits(19007.master1.ourmachine.com,alter) entered
07/28/2010 09:33:50;0002;   pbs_mom;n/a;mom_set_limits;setting limit for attribute 'neednodes'
07/28/2010 09:33:50;0002;   pbs_mom;n/a;mom_set_limits;setting limit for attribute 'nodes'
07/28/2010 09:33:50;0002;   pbs_mom;n/a;mom_set_limits;setting limit for attribute 'walltime'
07/28/2010 09:33:50;0002;   pbs_mom;n/a;mom_set_limits;mom_set_limits(19007.master1.ourmachine.com,alter) completed
07/28/2010 09:33:50;0008;   pbs_mom;Job;19007.master1.ourmachine.com;Job Modified at request of PBS_Server at master1.ourmachine.com
07/28/2010 09:33:50;0080;   pbs_mom;Svr;preobit_reply;top of preobit_reply
07/28/2010 09:33:50;0080;   pbs_mom;Svr;preobit_reply;DIS_reply_read/decode_DIS_replySvr worked, top of while loop
07/28/2010 09:33:50;0080;   pbs_mom;Svr;preobit_reply;in while loop, no error from job stat
07/28/2010 09:33:50;0080;   pbs_mom;Job;19007.master1.ourmachine.com;performing job clean-up in preobit_reply()
07/28/2010 09:33:50;0002;   pbs_mom;n/a;mom_close_poll;entered
07/28/2010 09:33:50;0080;   pbs_mom;Job;19007.master1.ourmachine.com;epilog subtask created with pid 1492 - substate set to JOB_SUBSTATE_OBIT - registered post_epilogue
07/28/2010 09:33:50;0080;   pbs_mom;Req;dis_request_read;decoding command Disconnect from PBS_Server
07/28/2010 09:33:50;0080;   pbs_mom;Req;dis_request_read;decoding command Disconnect from PBS_Server
07/28/2010 09:33:50;0008;   pbs_mom;Job;scan_for_terminated;entered
07/28/2010 09:33:50;0080;   pbs_mom;Svr;mom_get_sample;proc_array load started
07/28/2010 09:33:50;0080;   pbs_mom;n/a;mom_get_sample;proc_array loaded - nproc=151
07/28/2010 09:33:50;0080;   pbs_mom;n/a;cput_sum;proc_array loop start - jobid = 19007.master1.ourmachine.com
07/28/2010 09:33:50;0080;   pbs_mom;n/a;mem_sum;proc_array loop start - jobid = 19007.master1.ourmachine.com
07/28/2010 09:33:50;0080;   pbs_mom;n/a;resi_sum;proc_array loop start - jobid = 19007.master1.ourmachine.com
07/28/2010 09:33:50;0080;   pbs_mom;Job;19007.master1.ourmachine.com;checking job w/subtask pid=1492 (child pid=1492)
07/28/2010 09:33:50;0080;   pbs_mom;Job;19007.master1.ourmachine.com;found match with job subtask for pid=1492
07/28/2010 09:33:50;0080;   pbs_mom;Req;post_epilogue;preparing obit message for job 19007.master1.ourmachine.com
07/28/2010 09:33:50;0080;   pbs_mom;Job;19007.master1.ourmachine.com;obit sent to server
07/28/2010 09:33:50;0001;   pbs_mom;Job;19007.master1.ourmachine.com;job obit acknowledge received - substate set to JOB_SUBSTATE_EXITED
07/28/2010 09:33:50;0080;   pbs_mom;Req;dis_request_read;decoding command DeleteJob from PBS_Server
07/28/2010 09:33:50;0100;   pbs_mom;Req;;Type DeleteJob request received from PBS_Server at master1.ourmachine.com, sock=10
07/28/2010 09:33:50;0008;   pbs_mom;Job;process_request;request type DeleteJob from host master1.ourmachine.com received
07/28/2010 09:33:50;0008;   pbs_mom;Job;process_request;request type DeleteJob from host master1.ourmachine.com allowed
07/28/2010 09:33:50;0008;   pbs_mom;Job;dispatch_request;dispatching request DeleteJob on sd=10
07/28/2010 09:33:50;0008;   pbs_mom;Job;19007.master1.ourmachine.com;deleting job
07/28/2010 09:33:50;0080;   pbs_mom;Job;19007.master1.ourmachine.com;deleting job 19007.master1.ourmachine.com in state EXITED
07/28/2010 09:33:50;0080;   pbs_mom;Job;19007.master1.ourmachine.com;removing job
07/28/2010 09:33:50;0080;   pbs_mom;Job;19007.master1.ourmachine.com;removed job script
07/28/2010 09:33:50;0080;   pbs_mom;Job;19007.master1.ourmachine.com;removed job file
07/28/2010 09:33:50;0080;   pbs_mom;Req;dis_request_read;decoding command Disconnect from PBS_Server
07/28/2010 09:34:02;0002;   pbs_mom;n/a;mom_server_all_update_stat;composing status update for server
07/28/2010 09:34:02;0002;   pbs_mom;n/a;sessions;sessions[0]: pid 3519 sid 3518
07/28/2010 09:34:02;0002;   pbs_mom;n/a;sessions;sessions[1]: pid 26539 sid 26538
07/28/2010 09:34:02;0002;   pbs_mom;n/a;sessions;sessions[0]: pid 3519 sid 3518
07/28/2010 09:34:02;0002;   pbs_mom;n/a;sessions;sessions[1]: pid 26539 sid 26538
07/28/2010 09:34:02;0002;   pbs_mom;n/a;nusers;nusers[0]: pid 3519 uid 1016
07/28/2010 09:34:02;0002;   pbs_mom;n/a;nusers;nusers[1]: pid 26539 uid 1016
07/28/2010 09:34:02;0002;   pbs_mom;n/a;totmem;totmem: total mem=16836907008
07/28/2010 09:34:02;0002;   pbs_mom;n/a;availmem;availmem: free mem=15444119552
07/28/2010 09:34:02;0002;   pbs_mom;n/a;mom_server_update_stat;mom_server_update_stat: sending to server "opsys=linux"
07/28/2010 09:34:02;0002;   pbs_mom;n/a;mom_server_update_stat;mom_server_update_stat: sending to server "uname=Linux cn090 2.6.18-164.6.1.el5 #1 SMP Tue Nov 3 23:02:51 EST 2009 x86_64"
07/28/2010 09:34:02;0002;   pbs_mom;n/a;mom_server_update_stat;mom_server_update_stat: sending to server "sessions=3518 26538"
07/28/2010 09:34:02;0002;   pbs_mom;n/a;mom_server_update_stat;mom_server_update_stat: sending to server "nsessions=2"
07/28/2010 09:34:02;0002;   pbs_mom;n/a;mom_server_update_stat;mom_server_update_stat: sending to server "nusers=1"
07/28/2010 09:34:02;0002;   pbs_mom;n/a;mom_server_update_stat;mom_server_update_stat: sending to server "idletime=3523228"
07/28/2010 09:34:02;0002;   pbs_mom;n/a;mom_server_update_stat;mom_server_update_stat: sending to server "totmem=16442292kb"
07/28/2010 09:34:02;0002;   pbs_mom;n/a;mom_server_update_stat;mom_server_update_stat: sending to server "availmem=15082148kb"
07/28/2010 09:34:02;0002;   pbs_mom;n/a;mom_server_update_stat;mom_server_update_stat: sending to server "physmem=16442292kb"
07/28/2010 09:34:02;0002;   pbs_mom;n/a;mom_server_update_stat;mom_server_update_stat: sending to server "ncpus=8"
07/28/2010 09:34:02;0002;   pbs_mom;n/a;mom_server_update_stat;mom_server_update_stat: sending to server "loadave=0.00"
07/28/2010 09:34:02;0002;   pbs_mom;n/a;mom_server_update_stat;mom_server_update_stat: sending to server "gres="
07/28/2010 09:34:02;0002;   pbs_mom;n/a;mom_server_update_stat;mom_server_update_stat: sending to server "netload=4033950964007"
07/28/2010 09:34:02;0002;   pbs_mom;n/a;mom_server_update_stat;mom_server_update_stat: sending to server "state=free"
07/28/2010 09:34:02;0002;   pbs_mom;n/a;mom_server_update_stat;mom_server_update_stat: sending to server "jobs= "
07/28/2010 09:34:02;0002;   pbs_mom;n/a;mom_server_update_stat;mom_server_update_stat: sending to server "varattr= "
07/28/2010 09:34:02;0002;   pbs_mom;n/a;mom_server_update_stat;status update successfully sent to master1.ourmachine.com
07/28/2010 09:34:47;0002;   pbs_mom;n/a;mom_server_all_update_stat;composing status update for server
07/28/2010 09:34:47;0002;   pbs_mom;n/a;sessions;sessions[0]: pid 3519 sid 3518
07/28/2010 09:34:47;0002;   pbs_mom;n/a;sessions;sessions[1]: pid 26539 sid 26538
07/28/2010 09:34:47;0002;   pbs_mom;n/a;sessions;sessions[0]: pid 3519 sid 3518
07/28/2010 09:34:47;0002;   pbs_mom;n/a;sessions;sessions[1]: pid 26539 sid 26538
07/28/2010 09:34:47;0002;   pbs_mom;n/a;nusers;nusers[0]: pid 3519 uid 1016
07/28/2010 09:34:47;0002;   pbs_mom;n/a;nusers;nusers[1]: pid 26539 uid 1016
07/28/2010 09:34:47;0002;   pbs_mom;n/a;totmem;totmem: total mem=16836907008
07/28/2010 09:34:47;0002;   pbs_mom;n/a;availmem;availmem: free mem=15443742720
07/28/2010 09:34:47;0002;   pbs_mom;n/a;mom_server_update_stat;mom_server_update_stat: sending to server "opsys=linux"
07/28/2010 09:34:47;0002;   pbs_mom;n/a;mom_server_update_stat;mom_server_update_stat: sending to server "uname=Linux cn090 2.6.18-164.6.1.el5 #1 SMP Tue Nov 3 23:02:51 EST 2009 x86_64"
07/28/2010 09:34:47;0002;   pbs_mom;n/a;mom_server_update_stat;mom_server_update_stat: sending to server "sessions=3518 26538"
07/28/2010 09:34:47;0002;   pbs_mom;n/a;mom_server_update_stat;mom_server_update_stat: sending to server "nsessions=2"
07/28/2010 09:34:47;0002;   pbs_mom;n/a;mom_server_update_stat;mom_server_update_stat: sending to server "nusers=1"
07/28/2010 09:34:47;0002;   pbs_mom;n/a;mom_server_update_stat;mom_server_update_stat: sending to server "idletime=3523273"
07/28/2010 09:34:47;0002;   pbs_mom;n/a;mom_server_update_stat;mom_server_update_stat: sending to server "totmem=16442292kb"
07/28/2010 09:34:47;0002;   pbs_mom;n/a;mom_server_update_stat;mom_server_update_stat: sending to server "availmem=15081780kb"
07/28/2010 09:34:47;0002;   pbs_mom;n/a;mom_server_update_stat;mom_server_update_stat: sending to server "physmem=16442292kb"
07/28/2010 09:34:47;0002;   pbs_mom;n/a;mom_server_update_stat;mom_server_update_stat: sending to server "ncpus=8"
07/28/2010 09:34:47;0002;   pbs_mom;n/a;mom_server_update_stat;mom_server_update_stat: sending to server "loadave=0.00"
07/28/2010 09:34:47;0002;   pbs_mom;n/a;mom_server_update_stat;mom_server_update_stat: sending to server "gres="
07/28/2010 09:34:47;0002;   pbs_mom;n/a;mom_server_update_stat;mom_server_update_stat: sending to server "netload=4033951001599"
07/28/2010 09:34:47;0002;   pbs_mom;n/a;mom_server_update_stat;mom_server_update_stat: sending to server "state=free"
07/28/2010 09:34:47;0002;   pbs_mom;n/a;mom_server_update_stat;mom_server_update_stat: sending to server "jobs= "
07/28/2010 09:34:47;0002;   pbs_mom;n/a;mom_server_update_stat;mom_server_update_stat: sending to server "varattr= "
07/28/2010 09:34:47;0002;   pbs_mom;n/a;mom_server_update_stat;status update successfully sent to master1.ourmachine.com
07/28/2010 09:35:32;0002;   pbs_mom;n/a;mom_server_all_update_stat;composing status update for server
07/28/2010 09:35:32;0002;   pbs_mom;n/a;sessions;sessions[0]: pid 3519 sid 3518
07/28/2010 09:35:32;0002;   pbs_mom;n/a;sessions;sessions[1]: pid 26539 sid 26538
07/28/2010 09:35:32;0002;   pbs_mom;n/a;sessions;sessions[0]: pid 3519 sid 3518
07/28/2010 09:35:32;0002;   pbs_mom;n/a;sessions;sessions[1]: pid 26539 sid 26538
07/28/2010 09:35:32;0002;   pbs_mom;n/a;nusers;nusers[0]: pid 3519 uid 1016
07/28/2010 09:35:32;0002;   pbs_mom;n/a;nusers;nusers[1]: pid 26539 uid 1016
07/28/2010 09:35:32;0002;   pbs_mom;n/a;totmem;totmem: total mem=16836907008
07/28/2010 09:35:32;0002;   pbs_mom;n/a;availmem;availmem: free mem=15443873792
07/28/2010 09:35:32;0002;   pbs_mom;n/a;mom_server_update_stat;mom_server_update_stat: sending to server "opsys=linux"
07/28/2010 09:35:32;0002;   pbs_mom;n/a;mom_server_update_stat;mom_server_update_stat: sending to server "uname=Linux cn090 2.6.18-164.6.1.el5 #1 SMP Tue Nov 3 23:02:51 EST 2009 x86_64"
07/28/2010 09:35:32;0002;   pbs_mom;n/a;mom_server_update_stat;mom_server_update_stat: sending to server "sessions=3518 26538"
07/28/2010 09:35:32;0002;   pbs_mom;n/a;mom_server_update_stat;mom_server_update_stat: sending to server "nsessions=2"
07/28/2010 09:35:32;0002;   pbs_mom;n/a;mom_server_update_stat;mom_server_update_stat: sending to server "nusers=1"
07/28/2010 09:35:32;0002;   pbs_mom;n/a;mom_server_update_stat;mom_server_update_stat: sending to server "idletime=3523318"
07/28/2010 09:35:32;0002;   pbs_mom;n/a;mom_server_update_stat;mom_server_update_stat: sending to server "totmem=16442292kb"
07/28/2010 09:35:32;0002;   pbs_mom;n/a;mom_server_update_stat;mom_server_update_stat: sending to server "availmem=15081908kb"
07/28/2010 09:35:32;0002;   pbs_mom;n/a;mom_server_update_stat;mom_server_update_stat: sending to server "physmem=16442292kb"
07/28/2010 09:35:32;0002;   pbs_mom;n/a;mom_server_update_stat;mom_server_update_stat: sending to server "ncpus=8"
07/28/2010 09:35:32;0002;   pbs_mom;n/a;mom_server_update_stat;mom_server_update_stat: sending to server "loadave=0.00"
07/28/2010 09:35:32;0002;   pbs_mom;n/a;mom_server_update_stat;mom_server_update_stat: sending to server "gres="
07/28/2010 09:35:32;0002;   pbs_mom;n/a;mom_server_update_stat;mom_server_update_stat: sending to server "netload=4033951015781"
07/28/2010 09:35:32;0002;   pbs_mom;n/a;mom_server_update_stat;mom_server_update_stat: sending to server "state=free"
07/28/2010 09:35:32;0002;   pbs_mom;n/a;mom_server_update_stat;mom_server_update_stat: sending to server "jobs= "
07/28/2010 09:35:32;0002;   pbs_mom;n/a;mom_server_update_stat;mom_server_update_stat: sending to server "varattr= "
07/28/2010 09:35:32;0002;   pbs_mom;n/a;mom_server_update_stat;status update successfully sent to master1.ourmachine.com
</end logfile>


More information about the torqueusers mailing list