[torqueusers] PBS Error: Execution server rejected request
notinh notien
notinhnotien7 at hotmail.com
Thu Nov 3 19:39:15 MST 2005
Hi, all. I was able to set PBSLOGLEVEL to 7 in root's .bash_profile and now
I get a lot more logs in server_logs
Here is what I found at the server:
11/03/2005 18:53:17;0001;PBS_Server;Svr;PBS_Server;svr_setjobstate: setting
job 8202.master.stellar.com state from QUEUED to QUEUED-QUEUED (1-10)
11/03/2005 18:53:17;0008;PBS_Server;Job;8202.master.stellar.com;Job Modified
at request of Scheduler at master.stellar.com
11/03/2005 18:53:17;0008;PBS_Server;Job;8202.master.stellar.com;Job Run at
request of Scheduler at master.stellar.com
11/03/2005 18:53:17;0001;PBS_Server;Svr;PBS_Server;svr_setjobstate: setting
job 8202.master.stellar.com state from QUEUED to
RUNNING-JOB_SUBSTATE_RUNNING (4-41)
11/03/2005 18:53:17;0004;PBS_Server;Svr;WARNING;!!! unable to contact node
node14 !!!
11/03/2005 18:53:19;0008;PBS_Server;Job;8202.master.stellar.com;unable to
run job, MOM rejected
11/03/2005 18:53:19;0001;PBS_Server;Svr;PBS_Server;svr_setjobstate: setting
job 8202.master.stellar.com state from RUNNING to QUEUED-QUEUED (1-10)
11/03/2005 18:53:19;0001;PBS_Server;Svr;PBS_Server;svr_setjobstate: setting
job 8202.master.stellar.com state from QUEUED to QUEUED-QUEUED (1-10)
11/03/2005 18:53:19;0008;PBS_Server;Job;8202.master.stellar.com;Job Modified
at request of Scheduler at master.stellar.com
11/03/2005 18:53:19;0040;PBS_Server;Svr;master.stellar.com;Scheduler sent
command recyc
11/03/2005 18:53:27;0040;PBS_Server;Req;do_rpp;rpp request received on
stream 3
11/03/2005 18:53:27;0040;PBS_Server;Req;do_rpp;inter-server request received
11/03/2005 18:53:27;0004;PBS_Server;Svr;is_request;message received from
stream 3 (version 1)
I also set the PBSLOGLEVEL on the bad node and here is what I found:
11/03/2005 19:30:29;0002; pbs_mom;n/a;is_update_stat;is_update_stat:
sending to server "uname=Linux node14.stellar.com 2.4.20-31.9bigmem #1 SMP
Tue Apr 13 17:11:51 EDT 2004 i686"
11/03/2005 19:30:29;0002; pbs_mom;n/a;is_update_stat;setting alarm in
is_update_stat
11/03/2005 19:30:29;0002; pbs_mom;n/a;is_update_stat;is_update_stat:
sending to server "sessions=? 15201"
11/03/2005 19:30:29;0002; pbs_mom;n/a;is_update_stat;setting alarm in
is_update_stat
11/03/2005 19:30:29;0002; pbs_mom;n/a;is_update_stat;is_update_stat:
sending to server "nsessions=? 15201"
11/03/2005 19:30:29;0002; pbs_mom;n/a;is_update_stat;setting alarm in
is_update_stat
11/03/2005 19:30:29;0002; pbs_mom;n/a;is_update_stat;is_update_stat:
sending to server "nusers=0"
11/03/2005 19:30:29;0002; pbs_mom;n/a;is_update_stat;setting alarm in
is_update_stat
11/03/2005 19:30:29;0002; pbs_mom;n/a;is_update_stat;is_update_stat:
sending to server "idletime=18705"
11/03/2005 19:30:29;0002; pbs_mom;n/a;is_update_stat;setting alarm in
is_update_stat
11/03/2005 19:30:29;0002; pbs_mom;n/a;is_update_stat;is_update_stat:
sending to server "totmem=1964040kb"
11/03/2005 19:30:29;0002; pbs_mom;n/a;is_update_stat;setting alarm in
is_update_stat
11/03/2005 19:30:29;0002; pbs_mom;n/a;is_update_stat;is_update_stat:
sending to server "availmem=1523280kb"
11/03/2005 19:30:29;0002; pbs_mom;n/a;is_update_stat;setting alarm in
is_update_stat
11/03/2005 19:30:29;0002; pbs_mom;n/a;is_update_stat;is_update_stat:
sending to server "physmem=2061780kb"
11/03/2005 19:30:29;0002; pbs_mom;n/a;is_update_stat;setting alarm in
is_update_stat
11/03/2005 19:30:29;0002; pbs_mom;n/a;is_update_stat;is_update_stat:
sending to server "ncpus=4"
11/03/2005 19:30:29;0002; pbs_mom;n/a;is_update_stat;setting alarm in
is_update_stat
11/03/2005 19:30:29;0002; pbs_mom;n/a;is_update_stat;is_update_stat:
sending to server "loadave=0.00"
11/03/2005 19:30:29;0002; pbs_mom;n/a;is_update_stat;setting alarm in
is_update_stat
11/03/2005 19:30:29;0002; pbs_mom;n/a;is_update_stat;setting alarm in
is_update_stat
11/03/2005 19:30:29;0002; pbs_mom;n/a;is_update_stat;setting alarm in
is_update_stat
11/03/2005 19:30:29;0002; pbs_mom;n/a;is_update_stat;is_update_stat:
sending to server "netload=2029475507"
11/03/2005 19:30:29;0002; pbs_mom;n/a;is_update_stat;status update
successfully sent to server
>From the bad node, everything seems to be ok and no error messages or
anything mentioned about the awareness of a job.
Please comment on this situation.
Thank you.
_________________________________________________________________
Don't just search. Find. Check out the new MSN Search!
http://search.msn.com/
More information about the torqueusers
mailing list