[torqueusers] pbs_iff: cannot read reply from pbs_server

Mahmood Naderan nt_mahmood at yahoo.com
Mon Feb 25 05:42:02 MST 2013



Hi
I removed one node from /var/spool/pbs/server_priv/nodes, then I ran the following command on the server
schedctl -k

qterm
pbs_server
maui

Now, as root, I am not able to delete jobs


root at orca:/home/mahmood# qdel 93098 93111 93406
pbs_iff: cannot read reply from pbs_server
No Permission.
qdel: cannot connect to server hpclab.orca (errno=15007) Unauthorized Request
qdel: Server could not connect to MOM 93111.hpclab.orca



Indeed the pbs_server process is running

root at orca:/home/mahmood# ps aux | grep pbs_server
root     16737  0.0  0.0  42604  4100 ?        S    15:31   0:00 pbs_server
root     21969  0.0  0.0   9384   880 pts/1    S+   15:49   0:00 grep --color=auto pbs_server


Also the server log shows nothing (as far as I understand)

02/25/2013 15:31:17;0086;PBS_Server;Svr;PBS_Server;Shutdown request from root at hpclab.orca
02/25/2013 15:31:17;0086;PBS_Server;Svr;PBS_Server;Starting to shutdown the server, type is Quick
02/25/2013
 15:31:21;0002;PBS_Server;Svr;PBS_Server;Server shutdown completed
02/25/2013 15:31:21;0002;PBS_Server;Svr;Log;Log closed
02/25/2013 15:31:47;0002;PBS_Server;Svr;Log;Log opened
02/25/2013 15:31:47;0006;PBS_Server;Svr;PBS_Server;Server hpclab.orca started, initialization type = 1
02/25/2013 15:31:47;0002;PBS_Server;Svr;Act;Account file /var/spool/pbs/server_priv/accounting/20130225 opened
02/25/2013 15:31:47;0040;PBS_Server;Req;setup_nodes;setup_nodes()
02/25/2013 15:31:47;0086;PBS_Server;Svr;PBS_Server;Recovered queue orcaq
02/25/2013 15:31:47;0086;PBS_Server;Svr;PBS_Server;Recovered queue medium
02/25/2013 15:31:47;0086;PBS_Server;Svr;PBS_Server;Recovered queue small
02/25/2013 15:31:47;0086;PBS_Server;Svr;PBS_Server;Recovered queue very_small
02/25/2013 15:31:47;0086;PBS_Server;Svr;PBS_Server;Recovered queue big
02/25/2013 15:31:47;0002;PBS_Server;Svr;PBS_Server;Expected 5, recovered 5 queues
02/25/2013
 15:31:47;0100;PBS_Server;Job;93098.hpclab.orca;enqueuing into orcaq, state 4 hop 1
02/25/2013 15:31:47;0086;PBS_Server;Job;93098.hpclab.orca;Requeueing job, substate: 42 Requeued in queue: orcaq
02/25/2013 15:31:47;0100;PBS_Server;Job;93111.hpclab.orca;enqueuing into orcaq, state 4 hop 1
02/25/2013 15:31:47;0086;PBS_Server;Job;93111.hpclab.orca;Requeueing job, substate: 42 Requeued in queue: orcaq
02/25/2013 15:31:47;0100;PBS_Server;Job;93406.hpclab.orca;enqueuing into orcaq, state 4 hop 1
02/25/2013 15:31:47;0086;PBS_Server;Job;93406.hpclab.orca;Requeueing job, substate: 42 Requeued in queue: orcaq
02/25/2013 15:31:47;0100;PBS_Server;Job;93523.hpclab.orca;enqueuing into orcaq, state 4 hop 1
02/25/2013 15:31:47;0086;PBS_Server;Job;93523.hpclab.orca;Requeueing job, substate: 42 Requeued in queue: orcaq
02/25/2013 15:31:47;0100;PBS_Server;Job;93524.hpclab.orca;enqueuing into orcaq, state 4 hop 1
02/25/2013
 15:31:47;0086;PBS_Server;Job;93524.hpclab.orca;Requeueing job, substate: 42 Requeued in queue: orcaq
02/25/2013 15:31:47;0100;PBS_Server;Job;93536.hpclab.orca;enqueuing into orcaq, state 4 hop 1
02/25/2013 15:31:47;0086;PBS_Server;Job;93536.hpclab.orca;Requeueing job, substate: 42 Requeued in queue: orcaq
02/25/2013 15:31:47;0086;PBS_Server;Job;93536.hpclab.orca;Requeueing job, substate: 42 Requeued in queue: orcaq
02/25/2013 15:31:47;0100;PBS_Server;Job;93605.hpclab.orca;enqueuing into orcaq, state 4 hop 1
02/25/2013 15:31:47;0086;PBS_Server;Job;93605.hpclab.orca;Requeueing job, substate: 42 Requeued in queue: orcaq
02/25/2013 15:31:47;0100;PBS_Server;Job;93607.hpclab.orca;enqueuing into orcaq, state 4 hop 1
02/25/2013 15:31:47;0086;PBS_Server;Job;93607.hpclab.orca;Requeueing job, substate: 42 Requeued in queue: orcaq
02/25/2013 15:31:47;0100;PBS_Server;Job;93608.hpclab.orca;enqueuing into orcaq, state 4 hop 1
02/25/2013
 15:31:47;0086;PBS_Server;Job;93608.hpclab.orca;Requeueing job, substate: 42 Requeued in queue: orcaq
02/25/2013 15:31:47;0100;PBS_Server;Job;93609.hpclab.orca;enqueuing into orcaq, state 4 hop 1
02/25/2013 15:31:47;0086;PBS_Server;Job;93609.hpclab.orca;Requeueing job, substate: 42 Requeued in queue: orcaq
02/25/2013 15:31:47;0100;PBS_Server;Job;93612.hpclab.orca;enqueuing into orcaq, state 4 hop 1
02/25/2013 15:31:47;0086;PBS_Server;Job;93612.hpclab.orca;Requeueing job, substate: 42 Requeued in queue: orcaq
02/25/2013 15:31:47;0100;PBS_Server;Job;93613.hpclab.orca;enqueuing into orcaq, state 4 hop 1
02/25/2013 15:31:47;0086;PBS_Server;Job;93613.hpclab.orca;Requeueing job, substate: 42 Requeued in queue: orcaq
02/25/2013 15:31:47;0002;PBS_Server;Svr;PBS_Server;Expected 12, recovered 12 jobs
02/25/2013 15:31:47;0006;PBS_Server;Svr;PBS_Server;Using ports Server:15001  Scheduler:15004  MOM:15002 (server:
 'hpclab.orca')
02/25/2013 15:31:47;0002;PBS_Server;Svr;daemonize_server;INFO:      parent is exiting
02/25/2013 15:31:47;0002;PBS_Server;Svr;daemonize_server;INFO:      parent is exiting
02/25/2013 15:31:47;0002;PBS_Server;Svr;daemonize_server;INFO:      child process in background
02/25/2013 15:31:47;0002;PBS_Server;Svr;PBS_Server;Server Ready, pid = 16737, loglevel=0
02/25/2013 15:31:47;0004;PBS_Server;Svr;WARNING;ALERT: unable to contact node orca
02/25/2013 15:31:52;0002;PBS_Server;Svr;PBS_Server;Torque Server Version = 3.0.0, loglevel = 0
02/25/2013 15:36:52;0002;PBS_Server;Svr;PBS_Server;Torque Server Version = 3.0.0, loglevel = 0
02/25/2013 15:41:53;0040;PBS_Server;Svr;hpclab.orca;Scheduler was sent the command scheduler_first
02/25/2013 15:41:53;0002;PBS_Server;Svr;PBS_Server;Torque Server Version = 3.0.0, loglevel = 0
02/25/2013
 15:41:53;0080;PBS_Server;Req;dis_request_read;req header bad, dis error 7 (Premature end of message), type=Connect
02/25/2013
 15:41:53;0080;PBS_Server;Req;req_reject;Reject reply code=15058(Bad DIS
 based Request Protocol MSG=cannot decode message), aux=0, type=Connect,
 from @
02/25/2013 15:41:53;0002;PBS_Server;Req;dis_reply_write;DIS reply failure, -1


 
Regards,
Mahmood
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20130225/3bc8c5ca/attachment-0001.html 


More information about the torqueusers mailing list