[torqueusers] PBS Error: Execution server rejected request

notinh notien notinhnotien7 at hotmail.com
Thu Nov 3 19:39:15 MST 2005


Hi, all. I was able to set PBSLOGLEVEL to 7 in root's .bash_profile and now 
I get a lot more logs in server_logs
Here is what I found at the server:

11/03/2005 18:53:17;0001;PBS_Server;Svr;PBS_Server;svr_setjobstate: setting 
job 8202.master.stellar.com state from QUEUED to QUEUED-QUEUED (1-10)

11/03/2005 18:53:17;0008;PBS_Server;Job;8202.master.stellar.com;Job Modified 
at request of Scheduler at master.stellar.com
11/03/2005 18:53:17;0008;PBS_Server;Job;8202.master.stellar.com;Job Run at 
request of Scheduler at master.stellar.com
11/03/2005 18:53:17;0001;PBS_Server;Svr;PBS_Server;svr_setjobstate: setting 
job 8202.master.stellar.com state from QUEUED to 
RUNNING-JOB_SUBSTATE_RUNNING (4-41)

11/03/2005 18:53:17;0004;PBS_Server;Svr;WARNING;!!! unable to contact node 
node14 !!!
11/03/2005 18:53:19;0008;PBS_Server;Job;8202.master.stellar.com;unable to 
run job, MOM rejected
11/03/2005 18:53:19;0001;PBS_Server;Svr;PBS_Server;svr_setjobstate: setting 
job 8202.master.stellar.com state from RUNNING to QUEUED-QUEUED (1-10)

11/03/2005 18:53:19;0001;PBS_Server;Svr;PBS_Server;svr_setjobstate: setting 
job 8202.master.stellar.com state from QUEUED to QUEUED-QUEUED (1-10)

11/03/2005 18:53:19;0008;PBS_Server;Job;8202.master.stellar.com;Job Modified 
at request of Scheduler at master.stellar.com
11/03/2005 18:53:19;0040;PBS_Server;Svr;master.stellar.com;Scheduler sent 
command recyc
11/03/2005 18:53:27;0040;PBS_Server;Req;do_rpp;rpp request received on 
stream 3

11/03/2005 18:53:27;0040;PBS_Server;Req;do_rpp;inter-server request received

11/03/2005 18:53:27;0004;PBS_Server;Svr;is_request;message received from 
stream 3 (version 1)


I also set the PBSLOGLEVEL on the bad node and here is what I found:


11/03/2005 19:30:29;0002;   pbs_mom;n/a;is_update_stat;is_update_stat: 
sending to server "uname=Linux node14.stellar.com 2.4.20-31.9bigmem #1 SMP 
Tue Apr 13 17:11:51 EDT 2004 i686"
11/03/2005 19:30:29;0002;   pbs_mom;n/a;is_update_stat;setting alarm in 
is_update_stat
11/03/2005 19:30:29;0002;   pbs_mom;n/a;is_update_stat;is_update_stat: 
sending to server "sessions=? 15201"
11/03/2005 19:30:29;0002;   pbs_mom;n/a;is_update_stat;setting alarm in 
is_update_stat
11/03/2005 19:30:29;0002;   pbs_mom;n/a;is_update_stat;is_update_stat: 
sending to server "nsessions=? 15201"
11/03/2005 19:30:29;0002;   pbs_mom;n/a;is_update_stat;setting alarm in 
is_update_stat
11/03/2005 19:30:29;0002;   pbs_mom;n/a;is_update_stat;is_update_stat: 
sending to server "nusers=0"
11/03/2005 19:30:29;0002;   pbs_mom;n/a;is_update_stat;setting alarm in 
is_update_stat
11/03/2005 19:30:29;0002;   pbs_mom;n/a;is_update_stat;is_update_stat: 
sending to server "idletime=18705"
11/03/2005 19:30:29;0002;   pbs_mom;n/a;is_update_stat;setting alarm in 
is_update_stat
11/03/2005 19:30:29;0002;   pbs_mom;n/a;is_update_stat;is_update_stat: 
sending to server "totmem=1964040kb"
11/03/2005 19:30:29;0002;   pbs_mom;n/a;is_update_stat;setting alarm in 
is_update_stat
11/03/2005 19:30:29;0002;   pbs_mom;n/a;is_update_stat;is_update_stat: 
sending to server "availmem=1523280kb"
11/03/2005 19:30:29;0002;   pbs_mom;n/a;is_update_stat;setting alarm in 
is_update_stat
11/03/2005 19:30:29;0002;   pbs_mom;n/a;is_update_stat;is_update_stat: 
sending to server "physmem=2061780kb"
11/03/2005 19:30:29;0002;   pbs_mom;n/a;is_update_stat;setting alarm in 
is_update_stat
11/03/2005 19:30:29;0002;   pbs_mom;n/a;is_update_stat;is_update_stat: 
sending to server "ncpus=4"
11/03/2005 19:30:29;0002;   pbs_mom;n/a;is_update_stat;setting alarm in 
is_update_stat
11/03/2005 19:30:29;0002;   pbs_mom;n/a;is_update_stat;is_update_stat: 
sending to server "loadave=0.00"
11/03/2005 19:30:29;0002;   pbs_mom;n/a;is_update_stat;setting alarm in 
is_update_stat
11/03/2005 19:30:29;0002;   pbs_mom;n/a;is_update_stat;setting alarm in 
is_update_stat
11/03/2005 19:30:29;0002;   pbs_mom;n/a;is_update_stat;setting alarm in 
is_update_stat
11/03/2005 19:30:29;0002;   pbs_mom;n/a;is_update_stat;is_update_stat: 
sending to server "netload=2029475507"
11/03/2005 19:30:29;0002;   pbs_mom;n/a;is_update_stat;status update 
successfully sent to server


>From the bad node, everything seems to be ok and no error messages or 
anything mentioned about the awareness of a job.

Please comment on this situation.
Thank you.

_________________________________________________________________
Don't just search. Find. Check out the new MSN Search! 
http://search.msn.com/



More information about the torqueusers mailing list